Selection and evaluation of relevant predictors for credit scoring in peer-to-peer lending with random forest based methods
Zaytsev, Vitaly (2020)
Diplomityö
Zaytsev, Vitaly
2020
School of Engineering Science, Tuotantotalous
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2020062946078
https://urn.fi/URN:NBN:fi-fe2020062946078
Tiivistelmä
This study focuses on identifying the important predictor variables for credit scoring using an empirical case of peer-to-peer (P2P) lending data. It contributes to other existing papers that also studied the relevance of predictors in P2P lending by incorporating a larger collection of datasets coming from two European and two U.S.-based P2P platforms and providing an aggregated overview of variables consistently found as important. The variable selection is performed with wrapper methods based on underlying random forest classification models, which predict the loan outcome either as defaulted or repaid. Since the underlying model was able to achieve a reasonable prediction accuracy, it became possible to further analyze the relevance of variables that contributed to making accurate predictions by comparing their respective relative importance metrics. The robustness of findings with respect to identified variables is further improved by combining the results from several methods. In addition to the actual variable selection results and extracted importance metrics, the most important variables are further assessed with regards to their estimated marginal effect on loan outcome. The results are verified for their conformity with other similar papers, and their implications are analyzed from different aspects relevant for investors and those in charge of operating P2P lending platforms.