Skip to content

SooyeonWon/predicting_default_risk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

0456018 · Oct 28, 2020

History

11 Commits
Oct 7, 2020
Oct 7, 2020
Oct 7, 2020
Oct 28, 2020
Oct 6, 2020
Oct 5, 2020
Oct 5, 2020
Oct 7, 2020

Repository files navigation

Predicting Default Risk - Creditworthiness

Classification Modelling and Performance Comparison

by Sooyeon Won

Keywords

  • Analytical Framework
  • Supervised Learning
  • Binary Classification Models
    • Logistic Regression Model
    • Decision Tree Model
    • Random Forest Model
    • Gradient Boosting Model
    • AdaBoost Model
  • ROC curve
  • K-fold cross-validation
  • Model Selection and its Application

Summary of Findings

A bank recently received an influx of loan applications. I built and apply different classification models to provide an appropriate recommendation. Among the various models, I found out that Random Forest Model is the most suitable for predicting the creditworthiness of credit applicants. Based on the Random Forest model, I concluded 408 customers belong to the segment 'Creditworthy’.

This analysis is separated into three parts.

  1. The first part of the analysis (Filename: 01_Creditworthiness_Data_Cleaning_Exploration.ipynb) contains the process of Data Cleaning and Data Exploration with appropriate visualizations. At the end of the Part1, I select the features associated with the creditworthiness of customers, which is the target variable of the analysis.
  2. In the second part of the analysis (Filename: 02_Creditworthiness_Data_Analysis.ipynb), I trained the dataset with various classification models: Logistic Regression, Decision Tree, Random Forest, Gradient Boosted Model, and AdaBoost Model. Then they are validated with the test dataset. Additionally, I evaluate the performance of the models with k-fold cross validation techniques.
  3. After selecting the model with best performance, I conclude that how many new credit applicants are classified as creditworthy customers.

References

Logistic Regression (aka logit, MaxEnt) classifier - sklearn
How to Calculate Feature Importance With Python
Decision Tree Classifier - sklearn
Categorical Features and Encoding in Decision Trees