This repository contains all the notebooks, resources, and documentation used to develop and evaluate models for the Automated Essay Scoring (AES) Kaggle competition. The project aims to build an open-source solution for automated essay evaluation to support educators and provide timely feedback to students.
-
Develop Accurate Scoring Models
Create models capable of predicting essay scores with high agreement to human evaluators using the Quadratic Weighted Kappa (QWK) metric. -
Support Educators
Reduce manual grading workload and provide consistent scoring across diverse writing samples. -
Ensure Fairness
Mitigate algorithmic bias across different demographics and writing styles. -
Enhance Feedback Loop
Offer timely, detailed, and constructive feedback to students for continuous improvement. -
Resource Efficiency
Design scalable and efficient models for deployment in diverse educational environments.
The dataset, provided by the competition host, includes essays scored on a 1–6 scale. The training data consists of rich textual features aligned with classroom standards, ensuring diversity and fairness.
The Quadratic Weighted Kappa (QWK) metric is used to evaluate model performance.
- QWK Range: -1 (worse than random) to 1 (perfect agreement).
- The metric penalizes large deviations between predicted and actual scores.
- Batch Size: 16
- Number of Samples: 17,000
- Device: NVIDIA Tesla P100 GPU | Intel(R) Xeon(R) CPU @ 2.20GHz
- Average Inference Time per Sample: ~56ms
- Total Inference Time: ~16 minutes for 17,000 samples
- Resolved encoding issues and normalized text.
- Engineered features like text length, spelling mistakes, and stopword ratios.
- Analyzed score distributions and textual patterns.
- Visualized relationships between features and scores.
- Linear regression achieved 63 QWK.
- LightGBM:
- Base: 70 QWK
- Hypertuned: 74 QWK
- Fine-Tuned Smollm2 (360M):
- Achieved 79 QWK.
- Word clouds and feature histograms for different score levels.
- Fine-tuned Smollm2 model is available on Hugging Face Hub: Smollm2-360M-Essay-Scoring.
- Linear Regression: Established initial benchmarks.
- Tuned hyperparameters using Optuna.
- Utilized weighted loss to address class imbalance.
- Fine-tuned a transformer-based model for essay scoring.
- Achieved the best performance with 79 QWK.
Model | QWK Score |
---|---|
Baseline Linear Regression | 63 |
LightGBM (Base) | 70 |
LightGBM (Hypertuned) | 74 |
Smollm2 (Fine-Tuned) | 79 |
The fine-tuned model is hosted on Hugging Face:
Smollm2-360M-Essay-Scoring (Includes Guide how to use this model)
- Explore additional datasets for improved generalization.
- Develop a user-friendly interface for educators and students.
- Experiment with larger transformer models for better performance.
This project is supported by the Kaggle competition dataset and resources. Special thanks to the Hugging Face team for their robust model and library support.
@misc {jatin_mehra_2024,
author = { {Jatin Mehra} },
title = { Smollm2-360M-Essay-Scoring (Revision 467ceb5) },
year = 2024,
url = { https://huggingface.co/jatinmehra/Smollm2-360M-Essay-Scoring },
doi = { 10.57967/hf/3924 },
publisher = { Hugging Face }
}
@misc{learning-agency-lab-automated-essay-scoring-2,
author = {Scott Crossley and Perpetual Baffour and Jules King and Lauryn Burleigh and Walter Reade and Maggie Demkin},
title = {Learning Agency Lab - Automated Essay Scoring 2.0},
year = {2024},
howpublished = {\url{https://kaggle.com/competitions/learning-agency-lab-automated-essay-scoring-2}},
note = {Kaggle}
}
@misc{allal2024SmolLM2,
title={SmolLM2 - with great data, comes great performance},
author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf},
year={2024},
}