Lending Club Default Prediction

Harvard University CS109A Summer 2018
Kenneth Brown - David Gil Garcaa - Nikat Patel

Research and Related Works

There has been a lot written about applying advanced data analytics to improve return or minimize the risk in P2P lending. Most of the lending sites in the market provide historical data on their activity and this renders this field as a perfect candidate for this type of analysis.

Most studies take a similar approach as we have in the sense of trying to predict potential defaults or late payments and in general reach similar conclusions as the ones in this study, that is, that there is a combination of features that can serve as predictors of how the loan will come to term.

That is the case in both “A Data-Driven Approach to Predict Default Risk of Loan for Online Peer-to-Peer (P2P) Lending” and “The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending.”. They both fit historical data on unsupervised statistical models such as Random Forest, SVM and Neural Networks.

In “Neural network survival analysis for personal loan data” the authors describe several issues that we had with our analysis including inputs that vary in meaning and importance over time and scaling a prediction for better performance while not exceeding our computing resources.

In “Weapons of Math Destruction” ONeil has a full chapter on inequality in the loan industry. One of her biggest points is this can lead to a self-fulfilling prophecy where this discrimination is based on factors that are exacerbated by this very discrimination. An important takeaway is to be aware of our prejudices and to always include a mechanism for human review for any algorithmic decision.


Y. Jin and Y. Zhu, “A Data-Driven Approach to Predict Default Risk of Loan for Online Peer-to-Peer (P2P) Lending,” 2015 Fifth International Conference on Communication Systems and Network Technologies, Gwalior, 2015, pp. 609-613.

Download


Serrano-Cinca, Carlos & Gutiérrez-Nieto, Begoña. (2016). The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decision Support Systems. 89. 10.1016/j.dss.2016.06.014.

Download


Baesens, Bart, et al. “Neural network survival analysis for personal loan data.” Journal of the Operational Research Society 56.9 (2005): 1089-1098.

Download


ONeil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, 2016.

Amazon Link