Data set obtained from kaggle : https://www.kaggle.com/datasets/lainguyn123/student-performance-factors/
Number of data rows: 6607
Number of columns: 20\
The code is written in python with the following modules:
- pandas
- numpy
- matplotlib.pyplot
- seaborn
- scipy
- Hours Studied
- Attendance
- Parental Involvement
- Access to Resources
- Extracurricular Activities
- Sleep Hours
- Previous Scores
- Motivation Level
- Internet Access
- Tutoring Sessions
- Family Income
- Teacher Quality
- School Type
- Peer Influence
- Physical Activity
- Learning Disabilities
- Parental Education Level
- Distance from Home
- Gender
- Exam Score
The first step in this analysis is an exploration of the data set found in the file data_exploration.ipynb
The second step is to determine which features or columns are related to the overall score in the file data_analysis.ipynb
The third step is to train a linear regression model to predict the exam scores in model_training.ipynb