A predictive analytics project aimed at identifying the likelihood of heart disease based on patient data using machine learning techniques. This repository includes data preprocessing, feature engineering, model training, and evaluation steps to deliver an effective and efficient predictive model.
The primary goal of this project is to:
- Develop a robust machine learning model to predict heart disease.
- Provide insights from patient health metrics to guide healthcare decisions.
- Enable the integration of predictive models into healthcare systems for real-time risk assessment.
- Data Preprocessing: Comprehensive cleaning, handling missing values, and normalizing data.
- Feature Engineering: Analysis and selection of the most influential factors in predicting heart disease.
- Machine Learning Models: Implementation of multiple ML algorithms including Logistic Regression, Random Forest, and Support Vector Machines (SVM).
- Evaluation Metrics: Performance comparison using accuracy, precision, recall, and ROC-AUC score.
- Interactive Visualizations: Tools to understand and explain predictions.
The project uses the UCI Heart Disease Dataset, containing detailed patient records, including:
- Age, sex, and other demographic details.
- Medical history metrics like cholesterol levels, fasting blood sugar, and ECG results.
- Languages: Python
- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
- Tools: Jupyter Notebook
- Data Exploration: Analyze the dataset for patterns and anomalies.
- Preprocessing: Clean and normalize the data for model training.
- Model Development:
- Train multiple models to identify the best-performing algorithm.
- Hyperparameter tuning for optimal results.
- Model Evaluation: Compare performance metrics to choose the most accurate model.
- Insights & Deployment: Generate actionable insights and prepare the model for deployment.
Heart-Disease-Prediction/
│
├── data/ # Raw and processed datasets
├── notebooks/ # Jupyter Notebooks for analysis and modeling
├── models/ # Saved machine learning models
├── src/ # Source code for training and evaluation
├── visuals/ # Charts and visualizations
├── README.md # Project documentation
- Best Model: Random Forest achieved an accuracy of 92%, outperforming other models.
- Key Predictors: Age, cholesterol level, and maximum heart rate were identified as significant predictors.
- Recommendations: Regular monitoring of these key metrics can help in early detection and prevention.
- Clone the repository:
git clone https://github.com/abhinavsaurabh/Heart-Disease-Prediction.git
- Navigate to the project directory and set up the environment:
cd Heart-Disease-Prediction pip install -r requirements.txt
- Run the notebooks in
notebooks/
to explore data and train models.
- Integrate deep learning techniques to improve prediction accuracy.
- Extend the dataset to include more diverse patient records.
- Develop a web-based application for real-time heart disease risk assessment.
Contributions are welcome! Please submit a pull request or open an issue to discuss potential improvements.
This project is licensed under the MIT License. See the LICENSE file for details.
For inquiries, feel free to reach out to:
- Author: Abhinav Saurabh
- Email: abhinav20127@iiitd.ac.in
- GitHub: abhinavsaurabh