This project focuses on Clustering Bank Customers using various methods available in the scikit-learn library. My goal was to gain insights into the nature of clustering methods and learn how to use scikit-learn to implement these techniques in practical problems. A relevant business sector for this application is Finance and Banking, where understanding different customer segments is crucial for the survival of any institution providing financial services. This project was also created for a Kaggle challenge called Credit Card Dataset for Clustering. The link to this event can be found in the repository description.
There are many techniques used in this notebook, but only a fraction of them are presented here. Please refer to the notebook to learn about all the techniques used.
Principal Component Analysis (PCA)
Clustering methods
- K-Means
- Agglomerative Clustering
- Affinity Propagation
- Spectral Clustering
- Gaussian Mixture Model
To run this notebook, you'll need to have Jupyter Notebook and an Anaconda environment set up on your system.
Open your terminal or command prompt and run:
git clone https://github.com/bjam24/bank_customer_segmentation_methods.git
cd bank_customer_segmentation_methods
conda create --name myenv python=3.8
conda activate myenv
pip install -r requirements.txt
jupyter notebook
- Python programming language
- Jupyter Notebook
- Kaggle: Your Machine Learning and Data Science Community https://www.kaggle.com/datasets/arjunbhasin2013/ccdata?resource=download