Skip to content

Commit

Permalink
adjusting readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Mrcl3 committed May 30, 2023
1 parent a506bd8 commit 3ae05d2
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 34 deletions.
34 changes: 0 additions & 34 deletions README

This file was deleted.

37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Project Overview

This project focuses on predicting customer happiness based on survey responses from a select customer cohort in the logistics and delivery domain. The main objective is to analyze the provided dataset, preprocess the data, and build classification models to predict customer happiness.

## Approach

1. **Exploratory Data Analysis and Preprocessing**

- Perform dataset exploration to gain insights into the data.
- Handle missing values, if any, by applying appropriate strategies such as imputation or removal.
- Identify and handle outliers using the Isolation Forest algorithm, considering a 10% baseline for outlier detection.
- Analyze feature correlations using correlation matrices to understand the relationships between variables.

2. **Classification Models (80-20 Split)**

- Choose classification models based on the LazyPredict library, which provides a quick overview of model performance.
- Select a random state for training and testing data splits to ensure reproducibility.
- Train and evaluate classification models such as XGBoost, ExtraTreesClassifier, DecisionTreeClassifier, and RandomForestClassifier.
- Optimize the parameters of XGBoost using cross-validation and grid search.
- Conduct SHAP (SHapley Additive exPlanations) analysis on the XGBoost model to interpret feature importances.
- Optimize the parameters of ExtraTreesClassifier using cross-validation and grid search.
- Conduct SHAP analysis on the ExtraTreesClassifier model to interpret feature importances.

3. **Data Augmentation**

- Apply data augmentation techniques, starting with the Synthetic Minority Over-sampling Technique (SMOTE), to address class imbalance if present.
- Consider downsizing the data and adjusting skewness to improve the performance of the classification models.
- Evaluate the performance of the augmented data using XGBoost and ExtraTreesClassifier models.

4. **Feature Engineering**

- Identify less significant features based on analysis and domain knowledge.
- Remove or transform these less significant features.
- Evaluate the performance of the models after feature engineering to assess the impact on predictive accuracy.

The project aims to provide insights into customer happiness prediction, assess the performance of different classification models, apply data augmentation techniques, and optimize feature engineering to improve predictive accuracy. Throughout the process, it's important to document findings and observations, and communicate the results effectively.

0 comments on commit 3ae05d2

Please sign in to comment.