Loan-Status-Prediction

The pipeline includes preprocessing steps for categorical feature encoding, generating a correlation heatmap for feature interaction analysis, and training a linear SVM to assess the likelihood of loan approval (binary classification). In a market context, this project mimics the decision-making process used by financial institutions for loan risk assessment. By encoding categorical variables such as employment status, property area, and education level, the model transforms qualitative data into quantitative insights. The SVM algorithm is particularly suitable due to its effectiveness in high-dimensional spaces, making it a strong candidate for binary classification problems in finance where margins between approved and denied loans may be subtle.

This framework serves as a prototype for building scalable loan approval models that financial institutions can use to automate risk evaluation, improve accuracy in decision-making, and minimize default rates.

Training Accuracy: ~80%

Code Workflow

Data Preprocessing

Heatmap Generation: The numerical columns in the dataset are selected, and a heatmap is created to visualize the correlation between the features using seaborn.

numeric_data = data.select_dtypes(include=[np.number])
sns.heatmap(numeric_data.corr(), annot=True, cmap='coolwarm', linewidths=0.5)

Feature Encoding: Categorical features are encoded into numerical values for use in machine learning

Loan_Status: Approved Y (1), Not Approved N (0)

Dependents: Number of Dependents (3+ -> 4)

Married: Married (Yes -> 1), Not Married (No -> 0)

Gender: Male (1), Female (0)

Education: Graduate (1), Not Graduate (0)

Self_Employed: Self-Employed (Yes -> 1), Not Self-Employed (No -> 0)

Property_Area: Urban (2), Rural (0), Semiurban (1)

Data Splitting

The dataset is split into training and test sets using an 90-10 split. Stratified sampling is used to maintain the proportion of labels in both sets. The random seed is set to 7 based on the observed training and test mean values.

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, stratify=Y, random_state=7)

Model Training

A Support Vector Classifier (SVC) with a linear kernel is trained on the dataset.

classifier = svm.SVC(kernel='linear')
classifier.fit(X_train, Y_train)

Model Evaluation

After training the model, predictions are made on the training set. The accuracy score is calculated, which achieves approximately 80% accuracy.

training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

Metrics across various models:

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LICENSE		LICENSE
README.md		README.md
imports.py		imports.py
insight.py		insight.py
model.py		model.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loan-Status-Prediction

Training Accuracy: ~80%

Code Workflow

Data Preprocessing

Data Splitting

Model Training

Model Evaluation

About

Releases

Packages

Languages

License

Kshitij-Shresth/Loan-Status-Prediction

Folders and files

Latest commit

History

Repository files navigation

Loan-Status-Prediction

Training Accuracy: ~80%

Code Workflow

Data Preprocessing

Data Splitting

Model Training

Model Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages