This project analyzes factors influencing student exam performance using nonparametric statistical methods. Based on the research paper "Simplifying Statistical Decision Making: A Research Scholar's Guide to Parametric and Non-Parametric Methods" by Pirani (2024) doi:10.56815/IJMRR.V3I3.2024/184-192 , we implement a comprehensive analysis following the CRISP-DM methodology.
- Business Understanding
- Data Understanding
- Data Preparation
- Statistical Method Selection
- Modeling
- Evaluation
- Dependencies
- Installation
- Usage
The goal is to identify key factors affecting student exam performance to help educators and administrators prioritize interventions and resource allocation.
Exam_Score
: Final exam performance
Hours_Studied
: Study hours allocatedAttendance
: Class attendance percentageSleep_Hours
: Average daily sleepPrevious_Scores
: Prior assessment scoresTutoring_Sessions
: Extra tutoring countPhysical_Activity
: Weekly physical activity hours
- Parental Involvement
- Access to Resources
- Motivation Level
- Family Income
- Extracurricular Activities
- School Type
- Gender
- And others...
- Used IQR method (1.5 × IQR rule)
- Implemented boxplot visualization
- Treated outliers by replacing with NA values
- Utilized KNN imputation (k=5)
- Visualized missing data patterns
- Handled both numeric and categorical variables
-
Visual inspection:
- Histograms with density plots
- Q-Q plots
-
Statistical testing using Jarque-Bera test:
- H0: Data follows normal distribution
- H1: Data does not follow normal distribution
- Results: p-values < 0.05 for most variables:
- Hours_Studied: p = 0.003
- Attendance: p = 0.001
- Sleep_Hours: p = 0.002
- Previous_Scores: p = 0.004
- Exam_Score: p = 0.001
- Only Physical_Activity showed p > 0.05 (p = 0.067)
Based on normality test results (p < 0.05), we rejected the null hypothesis of normality and selected nonparametric methods for analysis:
- Spearman correlation instead of Pearson
- Kruskal-Wallis test instead of ANOVA
- Dunn's test for post-hoc analysis
Implemented nonparametric methods:
-
Correlation Analysis:
- Spearman correlation for numeric variables
- Correlation matrix visualization
-
Categorical Analysis:
- Kruskal-Wallis tests for variables with >2 groups
- Dunn's test with Bonferroni correction for pairwise comparisons
- Selection criteria: p < 0.05 for statistical significance
-
Strongest predictors of exam performance:
- Attendance (ρ = 0.69)
- Hours studied (ρ = 0.49)
- Parental involvement (Kruskal-Wallis p < 0.001)
-
Moderate impact factors:
- Previous scores (ρ = 0.19)
- Family income (Kruskal-Wallis p < 0.001)
- Access to resources (Kruskal-Wallis p < 0.001)
-
Non-significant factors:
- Gender (Kruskal-Wallis p = 0.4548)
- School type (Kruskal-Wallis p = 0.3271)
library(ggplot2) # Visualization
library(gridExtra) # Multiple plot arrangement
library(dplyr) # Data manipulation
library(tidyr) # Data reshaping
library(rstatix) # Statistical tests
library(corrplot) # Correlation visualization
library(tseries) # Jarque-Bera test
library(VIM) # KNN imputation
- Clone this repository
- Install R and RStudio
- Install required packages:
install.packages(c("ggplot2", "gridExtra", "dplyr", "tidyr",
"rstatix", "corrplot", "tseries", "VIM"))