Skip to content

Kshitij-Shresth/Loan-Status-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Loan-Status-Prediction

The pipeline includes preprocessing steps for categorical feature encoding, generating a correlation heatmap for feature interaction analysis, and training a linear SVM to assess the likelihood of loan approval (binary classification). In a market context, this project mimics the decision-making process used by financial institutions for loan risk assessment. By encoding categorical variables such as employment status, property area, and education level, the model transforms qualitative data into quantitative insights. The SVM algorithm is particularly suitable due to its effectiveness in high-dimensional spaces, making it a strong candidate for binary classification problems in finance where margins between approved and denied loans may be subtle.

This framework serves as a prototype for building scalable loan approval models that financial institutions can use to automate risk evaluation, improve accuracy in decision-making, and minimize default rates.

Training Accuracy: ~80%

Code Workflow

Data Preprocessing

Heatmap Generation: The numerical columns in the dataset are selected, and a heatmap is created to visualize the correlation between the features using seaborn.

numeric_data = data.select_dtypes(include=[np.number])
sns.heatmap(numeric_data.corr(), annot=True, cmap='coolwarm', linewidths=0.5)

Feature Encoding: Categorical features are encoded into numerical values for use in machine learning

Loan_Status: Approved Y (1), Not Approved N (0)

Dependents: Number of Dependents (3+ -> 4)

Married: Married (Yes -> 1), Not Married (No -> 0)

Gender: Male (1), Female (0)

Education: Graduate (1), Not Graduate (0)

Self_Employed: Self-Employed (Yes -> 1), Not Self-Employed (No -> 0)

Property_Area: Urban (2), Rural (0), Semiurban (1)

Data Splitting

The dataset is split into training and test sets using an 90-10 split. Stratified sampling is used to maintain the proportion of labels in both sets. The random seed is set to 7 based on the observed training and test mean values.

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, stratify=Y, random_state=7)

Model Training

A Support Vector Classifier (SVC) with a linear kernel is trained on the dataset.

classifier = svm.SVC(kernel='linear')
classifier.fit(X_train, Y_train)

Model Evaluation

After training the model, predictions are made on the training set. The accuracy score is calculated, which achieves approximately 80% accuracy.

training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

Metrics across various models: image

image

image

image

image

Releases

No releases published

Packages

No packages published

Languages