Skip to content

Collaborative Filtering on the MovieLens-100k dataset with the SVD algorithm from surprise

Notifications You must be signed in to change notification settings

fverac/MovieRecommendations

Repository files navigation

Movie Recommendations with SVD

This repository contains code that runs collaborative filtering on data from the MovieLens-100k dataset to generate movie recommendations for users. Also runs feature analysis to determine whether or not the learned user/movie matrices from the SVD decomposition contain information about user gender and movie release year.

Installation

Will have to install numpy, scikit-learn, pandas and the surprise package.

Python files

The folder recommendation_system contains files:

modelselectionsvd.py : runs GridSearchCV to determine the best regularization parameter for the SVD algorithm

evaluationbyMAE : takes all the user movie ratings generated by the model and compares them against their actual counterparts in the test set to get the Mean Absolute Error.

evalutationbytop5 : generates the top 5 movie recommendations for each user and averages all ratings for such recommendations found in the test set.

The folder feature_analysis contains files:

userfeatures.py : Takes the user matrix learned from the SVD decomposition and uses the features learned there as well as the actual genders of each user to train a Logistic Regression classifier that predicts user gender solely from ratings.

moviefeatures.py : Takes the movie matrix learned from the SVD decomposition and uses the features learned there as well as the actual release years of each movie to train a Kernel Ridge Regression classifier that predicts movie release year solely from ratings. The model is then compared with a naive model that simply predicts movie release year with the mean movie release year.

Data files

trainset.csv: the training set of user ratings. There are three columns, (user-id, item-id, rating), as the headers indicate. There are 943 users, with their ids ranging from from 0 to 942. There are 1681 items, with their ids ranging from 0 to 1680.

testset.csv: the test set of user ratings. It has the same structure as the training set.

gender.csv: genders of 943 users. Female is 0, and male is 1.

release_year.csv: release years of 1681 movies

About

Collaborative Filtering on the MovieLens-100k dataset with the SVD algorithm from surprise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages