This repository is dedicated to developing a Python project using machine learning algorithms to analyze the importance of preprocessing steps in Car Insurance Claim Prediction.
I explore and compare the performance of various machine learning models with and without preprocessing, showcasing how these steps affect the accuracy, precision, recall, and overall predictions.
- Data Preprocessing: Demonstrates the importance of feature scaling, oversampling (using SMOTE), and undersampling to address class imbalance.
- Machine Learning Models: Implements multiple classifiers including:
- Random Forest
- Gradient Boosting
- XGBoost
- Logistic Regression
- Voting Classifier (Ensemble Model)
- Performance Metrics: Evaluates models using metrics like accuracy, precision, recall, F1-score, and classification reports.
- Python 3.x
- Pandas
- NumPy
- Scikit-learn
- XGBoost
- Imbalanced-learn
You can install the required packages using the following command:
pip install -r requirements.txt