GitHub - rmwkwok/credit_default_risk: A model was built, given some credit user's data, to control loss in response to a recession forecast. Modeling rationale and action plans were presented in a report.

Glossory

term	meaning
PD	probability of default
default	delinquency, not necessarily permanent loss
exposure / balance / credit exposure / outstanding loan	money lent to users
Good user / good month / good exposure	respective bill was paid (fully or partial)
Bad user / bad month / bad exposure	respective bill was not paid (less than minimun pay)
Transactors	users who paid the full bill within the grace period
Revolvers	users who continuously carried a balance on their account

Objective

Build a model given user's data to control loss in response to a recession forecast.

Business model

Control and metrics

In the period of recession we tend to be more conservative which requires a higher expected recall level from the model's "default" prediction. Thus, at development stage, recall is a control, and the expected precision and the expected ROI are the consequence and thus metrics. At deployment stage, however, real-world recall and ROI became the metrics for both the model and the business.

Expected ROI

It is estimated using a portfolio in August, pretending that the same transactions / overdue amount would happen for Novemeber (which was a bad assumption). When an user is predicted "default", his/her card is freezed so no additional transaction fee nor overdue fee would be generated, and the bank would not suffer loss from the user's transactions.

Only the true good users will be counted to the numerator(revenue) of the ROI, and only the true bad users will be counted towards the denominator (cost) of the ROI.

Datasets

3 datasets were used for modeling

raw features (25 features)

Features as provided. Exceptions include:

Feature	description
PAY_0	removed as it distributed differently than the other PAY_X and thus suspicious
EDUCATION, MARRAGE	regrouped for balancing

raw+engineered features (31 features)

Engineered features in the EDA process include:

Feature	description
BILL_VARIANCE	monthly fluctuation of amount billed
LONGEST_DELAY	longest delay period ever
NO_BAD_MONTHS	number of bad (delinquent) months
BILL_TO_CREDIT	resemble debt-to-income ratio
PAID_TO_BILLED	Ratio of amount paid to billed. A degree of goodness of user
BAD_MONTH_PROXIMITY	how long ago is the last bad month

causal model features (5 features; With thought stories, more realistic stories given industry experience)

5 features were picked from the following causal model built on the data and some common sense (better industry knowledge)

Feature	possible story (PD: probability of default)	correlation
MARRIAGE	-	Married person <-> higher PD
LIMIT_BAL	Sharing common causes with PD such as stability of income	Lower credits <-> higher PD
LONGEST_DELAY	Sharing common causes with PD such as stability of income	Longer <-> higher PD
NO_BAD_MONTHS	-	More bad months <-> higher PD
BAD_MONTH_PROXIMITY	Upset with recent experience of delinquent penalty	More recent -> higher PD

Model results

Precision-Recall tradeoff

From the following precision-recall tradeoff (right graph), the best model was the upper-most one in the region of interest, which is a LGB decision trees trained with the 31 features.

Profit-Recall tradeoff / ROI-Recall tradeoff

From the model's tradeoffs between profit/ROI and recall, using a recall over 0.6 will start to see profit dropping, and when recall is above 0.8, the ROI became volatile, and required careful examination of the remaining users.

Profit-Cost tradeoff / ROI-Cost tradeoff

The x-axis of the above curve can be translated to the cost, which is proportional to the credit exposure, which is a more instinctive control.

Model improvement action items

Given	To improve
Industry experience and knowledge (e.g. current rules for approving credit)	causal model -> robost prediction model
Longer period of data	understanding of PD seasonality -> robost prediction model
Longer period of data	understanding of spending trend -> better model feature
More user data (demographic, credit, etc.)	causal model and more model feature
User address	incoporation of macroscopic economic feature (e.g. unemployment, consumption, income)
User transaction data	understanding of change of user's spending practice -> more features
User transaction data	decoupling of BILL_AMOUNT to TRANSACTION_AMOUNT, PAY_AMOUNT and PENALTY_AMOUNT -> more accurate feature

Generally, with more data and understanding,

an ensemble of models can be built to predict from different perspectives
better user segmentation (portfolio) could be done for differentiation of treatment and modeling

General idea of making a plan

An action plan was illustrated in below, together with important inputs showing on the left hand side of the flow chart. One key concern is that usually a grace period can be up to 2 months, which is also the waiting time for us to finally find out if an user will miss the payment or not. The long period of waiting is not favourable in an organization faster-paced, and may invalidate the result for short-term reuse(e.g. due to seasonality). Therefore, sample size control had to be careful to allow more cycles running in parallel while for each cycle maintain the statistical significance.

Modeling summary

Recall/cost/profit/ROI/etc could be used as control to fine-tune the outcome of the model,
Improvement needed to make user segmentation and better prediction results.
Engineered features were helpful, because (1) it took 3 out of the 5 places in the casual selected feature set, (2) it provided improvement over the raw features.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
.gitignore		.gitignore
Part1_Cleaning_EDA_FeaturesSelection.ipynb		Part1_Cleaning_EDA_FeaturesSelection.ipynb
Part2_modeling_evaluation.ipynb		Part2_modeling_evaluation.ipynb
README.md		README.md
report.ipynb		report.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Glossory

Objective

Business model

Control and metrics

Expected ROI

Datasets

raw features (25 features)

raw+engineered features (31 features)

causal model features (5 features; With thought stories, more realistic stories given industry experience)

Model results

Precision-Recall tradeoff

Profit-Recall tradeoff / ROI-Recall tradeoff

Profit-Cost tradeoff / ROI-Cost tradeoff

Model improvement action items

General idea of making a plan

Modeling summary

About

Releases

Packages

Languages

rmwkwok/credit_default_risk

Folders and files

Latest commit

History

Repository files navigation

Glossory

Objective

Business model

Control and metrics

Expected ROI

Datasets

raw features (25 features)

raw+engineered features (31 features)

causal model features (5 features; With thought stories, more realistic stories given industry experience)

Model results

Precision-Recall tradeoff

Profit-Recall tradeoff / ROI-Recall tradeoff

Profit-Cost tradeoff / ROI-Cost tradeoff

Model improvement action items

General idea of making a plan

Modeling summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages