Completed
Part 1: Data Analysis (EDA)
The objective of this project is to answer five valuable business questions related to life expectancy. Subsequently, the project will be extended to create an AI model that can predict life expectancy based on the provided data.
- What are the main factors that influence life expectancy in different countries?
- How has life expectancy changed over time in countries with different socioeconomic statuses?
- What is the relationship between healthcare expenditure and life expectancy?
- Is there a correlation between immunization coverage (Hepatitis B, Polio, Diphtheria) and life expectancy?
- How does alcohol consumption affect life expectancy in different regions?
- Import data from CSV.
- Clean data (handling missing values, duplicate data, etc.).
- Descriptive statistics.
- Data visualization to identify patterns and trends.
- Correlation analysis between variables.
- Use Python techniques to identify factors influencing life expectancy.
- Perform temporal analysis to observe changes in life expectancy over the years.
- Conduct correlation analysis to investigate the relationship between healthcare expenditure and life expectancy.
- Analyze the correlation between immunizations and life expectancy.
- Perform regional analysis on the impact of alcohol consumption on life expectancy.
- Export the file in
.csv
for use in: Model Training Notebook
Part 2: Project Extension - AI Model
Develop a predictive model to estimate life expectancy based on the provided data.
- Select relevant features.
- Normalize and transform data.
- Split data into training and testing sets.
- Test various algorithms (Linear Regression, Random Forest, Gradient Boosting, etc.).
- Evaluate model performance using appropriate metrics (MAE, RMSE, R²).
- Select the best performing model.
- Cross-validation.
- Hyperparameter tuning using GridSearchCV or RandomizedSearchCV.
- Train the final model with the best parameters.
- Save the trained model for future use.
Programming Language
- Python
Libraries
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Scipy
- Statsmodels
- Scikit-learn
- Requests
- Xgboost
- Shap
- Graphviz
- Joblib