Skip to content

Latest commit

 

History

History
29 lines (23 loc) · 2.68 KB

README.md

File metadata and controls

29 lines (23 loc) · 2.68 KB

Credit Risk Analysis

Analyst: Stanley Misina, Columbia University Data Analytics Bootcamp
Systems Used: Python 3.8.11, Jupyter Notebook 6.4.3
Data Source Provided: LoanStats_2019Q1.csv

Overview

Machine Learning in Finance

Credit risk is an inherently unbalanced classification problem due to the number of data points observed and weighed when applying. The task of building a reliable credit evaluation process is of the utmost importance for modern lenders. Machine learning solutions are powerful for predicting credit worthiness, anticipating anomalies, and reducing risk.

The dataset used for this evaluation is from LendingClub, a peer-to-peer lending services company.

Results

Comparison of Models for Credit Decision

When testing performace of these models, four primary measures are considered: accuracy, precision, recall, and processing time.

  • Accuracy is the ratio of correct predictions to the total number of input samples
  • f1 score is a simplified measure of model performance. It is a weighted harmonic mean of precision and recall
    • Precision is the ability of a classifier not to label an instance positive that is actually negative
    • Recall is the ability of a model to find all positive instances
  • Processing time is taken into consideration as the models all run at different rates of speed - this factor will be important when processing on a large-scale and considering service level expectations

Performance Metrics

We have chosen six models for comparison. The results for each present on a scale of 0 to 1. Closer to one indicates better performance in the testing environment. Accuracy_Time_Results

Summary

Recommendations

  • Our recommended processing model is the Easy Ensemble Adaptive Boost Classifier (EEABC). With an accuracy score of 0.93, and f1 of 0.97, this is the most accurate performer of the six tested; however, there is a caveat in processing time. When running multiple credit applications, 33 seconds per decision can be prohibitive where time sensitity is important (i.e., an 'instant approval' promise).

  • A secondary recommendation is to use a two-tiered process by employing the Balanced Random Forest Classifier (BRFC) for fast primary processing (only 1.76 seconds per decision) that performs well regarding precision (0.99), and f1 (0.93). Accuracy and recall are behind EEABC at .078 and 0.87 respectively. EEABC could be run as a secondary pipeline for turn down applications and approval auditing.