This project focuses on predicting individual medical insurance costs using demographic and health-related features. By leveraging machine learning models, the repository provides insights into how factors such as age, BMI, and smoking status affect insurance premiums.
- Data Preprocessing: Handling missing values, encoding categorical data, and feature scaling.
- Exploratory Data Analysis (EDA): Visualizing trends and correlations between features and insurance costs.
- Model Training: Implementing machine learning algorithms to predict insurance charges.
- Performance Metrics: Evaluating the accuracy and reliability of models using metrics like Mean Absolute Error (MAE).
Medical-Insurance/
├── data/ # Dataset for training and testing
├── notebooks/ # Jupyter notebooks for EDA and model development
├── scripts/ # Python scripts for preprocessing and model training
├── visualizations/ # Charts and graphs for insights
├── models/ # Trained machine learning models
├── README.md # Project documentation
└── LICENSE # License information
- Python: Core programming language.
- Pandas: For data manipulation and preprocessing.
- Matplotlib/Seaborn: Visualizing relationships between features.
- Scikit-learn: Building and evaluating machine learning models.
- NumPy: Efficient numerical computations.
The dataset includes the following features:
- Age: Age of the individual.
- Sex: Gender (male/female).
- BMI: Body mass index.
- Children: Number of dependents.
- Smoker: Whether the individual is a smoker.
- Region: Geographical region.
- Charges: Medical insurance costs (target variable).
The dataset can be sourced from platforms such as Kaggle.
For further reading and reference: