This project predicts customer churn for a bank by analyzing customer data using machine learning techniques. It includes data preprocessing, such as handling missing values and encoding features, and model training with Random Forest and XGBoost to identify patterns associated with churn. The best model is deployed using Streamlit, allowing users to input customer details and receive real-time churn predictions. This solution helps banks proactively address customer retention challenges.
You can try the deployed Streamlit app here: Customer Churn Prediction App
- Overview
- Technologies Used
- Project Structure
- Setup Instructions
- Usage
- Features
- Model Evaluation Results
- Contributing
- License
The Customer Churn Prediction project aims to predict whether a customer will churn based on historical data. The project includes:
- Data preprocessing (handling missing values, feature encoding, and scaling).
- Model training with Random Forest and XGBoost classifiers.
- Evaluation of model performance.
- Deployment of the best-performing model using Streamlit.
- Python 3.10
- Pandas for data manipulation.
- Scikit-learn for preprocessing and machine learning.
- XGBoost for gradient boosting.
- Streamlit for creating a web-based UI.
- Joblib for model serialization.
├── data/
│ └── Customer-Dataset.csv # Dataset
├── src/
│ ├── Training-Model.ipynb # Notebook for training models
│ ├── Training-Model_OOP.py # OOP-based training script
├── models/
│ ├── xgb_classifier_model.pkl # Trained XGBoost model
│ ├── gender_encoder.pkl # Gender encoder
│ ├── hasCrCard_encoder.pkl # HasCrCard encoder
│ ├── isActiveMember_encoder.pkl # IsActiveMember encoder
├── README.md # Project documentation
└── requirements.txt # Python dependencies
└── main.py # Streamlit deployment script
- Python 3.10 or higher
- Pipenv or virtualenv for environment management
-
Clone the repository:
git clone https://github.com/steveee27/Customer-Churn-Prediction.git cd Customer-Churn-Prediction
-
Create a virtual environment:
python -m venv env source env/bin/activate # On Windows: env\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the OOP-based training script to train the model and save artifacts:
python src/Training-Model_OOP.py
-
Ensure that the trained model and encoders are saved in the
models/
directory.
-
Start the Streamlit app:
streamlit run main.py
-
Open the URL provided by Streamlit (e.g.,
http://localhost:8501
) in your browser.
Enter customer data in the web interface to get a prediction on whether the customer will churn.
- Data Preprocessing: Cleans and prepares raw data for modeling, including handling missing values, encoding categorical variables, and scaling numerical features.
- Model Training: Implements two machine learning models (Random Forest and XGBoost) with hyperparameter tuning to optimize performance.
- Evaluation: Provides detailed classification reports and other metrics to evaluate model performance.
- Deployment: Features a user-friendly web application built with Streamlit to predict customer churn interactively.
The table below compares the performance of the Random Forest and XGBoost models based on their hyperparameters and evaluation metrics:
Model | Hyperparameters | Precision (Churn) | Recall (Churn) | F1-Score (Churn) | Accuracy |
---|---|---|---|---|---|
Random Forest | n_estimators=100 , criterion='gini' , max_depth=10 |
0.77 | 0.52 | 0.62 | 86% |
XGBoost | n_estimators=100 , learning_rate=0.1 , max_depth=3 , subsample=0.5 |
0.76 | 0.55 | 0.64 | 86% |
-
XGBoost Model:
- Achieved slightly higher recall (0.55) and F1-score (0.64) compared to Random Forest.
- This makes XGBoost more suitable for scenarios where capturing more true churn cases is critical.
-
Random Forest Model:
- Has a slightly higher precision (0.77) than XGBoost, making it better for reducing false positives in churn prediction.
-
Both models achieved an accuracy of 86%, indicating comparable overall performance, but their suitability depends on the specific business use case.
Based on the evaluation, XGBoost is selected as the final model due to its higher F1-score and recall, which are crucial for minimizing churn cases missed. The trained XGBoost model has been saved as xgb_classifier_model.pkl
and is deployed for predictions in the Streamlit application.
Contributions to the project are welcome! To contribute, follow these steps:
- Fork the repository.
- Create a new branch for your feature or bugfix:
git checkout -b feature/your-feature-name
- Make your changes and commit them:
git commit -m "Add your message here"
- Push your changes to your forked repository:
git push origin feature/your-feature-name
- Create a pull request in the main repository.
For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License. You are free to use, modify, and distribute this project as long as proper attribution is given to the original author.