Prediting the popularity of Reddit Comments using Linear Regression
This mini-project was undertaken as part of COMP-551 at McGill university.
This work used a dataset of 12000 instances of reddit comments. A linear regression model was created from scratch and used to predict the popularity of Reddit comments.
The following files were used:
proj1_data_loading.py : to load all reddit comments
implement_linreg.py : a closed form linear regression solution as well as one based on SGD
linreg_pipeline.py : main file calling functions and generating predictions.
See the writeup.pdf for details on the methodology and results.