This repository contains the code and resources for the Life Expectancy Prediction Project, which aims to predict life expectancy based on various health, economic, and social factors. This project is part of a data science portfolio and demonstrates end-to-end model development, including data preprocessing, exploratory data analysis (EDA), model training, evaluation, and interpretation.
Life expectancy is a critical measure of a country's health and development. This project uses a dataset containing various indicators to predict life expectancy. The main steps of the project include:
- Data Preprocessing
- Exploratory Data Analysis (EDA)
- Model Selection and Training
- Hyperparameter Tuning
- Model Interpretation
- Model Saving
The dataset used in this project is 'Life Expectancy Data.csv', which contains the following columns:
- Country
- Year
- Status
- Life expectancy
- Adult Mortality
- Infant deaths
- Alcohol
- Percentage expenditure
- Hepatitis B
- Measles
- BMI
- Under-five deaths
- Polio
- Total expenditure
- Diphtheria
- HIV/AIDS
- GDP
- Population
- Thinness 1-19 years
- Thinness 5-9 years
- Income composition of resources
- Schooling
data/
: Contains the dataset file 'Life Expectancy Data.csv'.notebooks/
: Jupyter notebooks for data analysis and model development.models/
: Directory to save trained models.README.md
: Project overview.
Code to handle missing values, encode categorical variables, and scale numerical features.
Code to visualize the distribution of variables, correlation matrix, pair plots, and other insights.
Code to train and evaluate multiple models (Linear Regression, Random Forest, Gradient Boosting) and select the best model based on performance metrics.
Code to perform hyperparameter tuning for the best model using GridSearchCV.
Code to interpret the model using feature importance and permutation importance.
Code to save the trained model using joblib for future use.
To run the project, follow these steps:
- Preprocess the data.
- Perform EDA to understand the data better.
- Train multiple models and evaluate their performance.
- Tune hyperparameters for the best model.
- Interpret the model to understand feature importance.
- Save the final model for future predictions.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or suggestions, feel free to open an issue or contact me at [subhro2002@gmail.com].