Skip to content

This project implements a movie recommendation service with Apache Spark using collaborative filtering.

Notifications You must be signed in to change notification settings

srimallipudi/Movie-Recommendation-Service-using-Apache-Spark

Repository files navigation

Business Problem

A new startup business, Ripe Pumpkins - a movie review-aggregation service, would like to implement Pumpkinmeter, a measurement of collaborative recommendation for millions of fans. The board of directors have been convinced by the recent success of recommendation model in streaming services and would like to know the potential in the Ripe Pumpkins' new initiative, Pumpkinmeter score.

Movie Recommendation Service

Project Description:

This project implements a movie recommendation service using Apache Spark, specifically focusing on collaborative filtering. Collaborative filtering is a technique used in recommendation systems where predictions about a user's preferences or interests are made by collecting information from other users with similar tastes.

The project involves the following key steps:

Setting up Spark Context:

The project starts with setting up a Spark Context configured for local mode.

Data Loading and Preprocessing:

The MovieLens dataset, including ratings and movie information, is loaded into Spark RDDs. Data preprocessing steps include parsing the CSV files and filtering out unnecessary information.

Collaborative Filtering:

Collaborative filtering is implemented using the Alternating Least Squares (ALS) algorithm provided by Spark's MLlib library. The model is trained on the ratings data to make predictions about user preferences.

Parameter Selection:

The ALS model is trained using the small dataset, and different parameters such as rank are experimented with to select the best-performing model.

Model Training and Testing:

The ALS model is trained using the selected parameters on the complete dataset. The training phase involves iterating over different ranks to find the model with the lowest Root Mean Square Error (RMSE). Tested to evaluate its performance in predicting movie ratings.

Making Recommendations:

Once the model is trained, Recommendations are generated for a new user by first adding their ratings to the dataset and then using the trained model to predict ratings for unrated movies.

Scenario Analysis:

Finally, it provides scenario-based analysis such as generating recommendations for users based on different rating count thresholds and lists the top recommended movies for the new user.

User Interface and Interaction:

The project discusses how user interfaces and interactions can be designed to gather customer input effectively, enhancing the recommendation engine's performance.

Project Contents:

  1. Data: Contains the MovieLens dataset files (ratings.csv, movies.csv) used for training the recommendation engine.
  2. Code: Includes Python scripts for data loading, preprocessing, model training, recommendation generation, and parameter selection.

Future Enhancements:

  1. Real-time Updates: Implement mechanisms to handle real-time user interactions and update recommendations dynamically.
  2. Advanced Algorithms: Explore advanced recommendation algorithms such as content-based filtering, matrix factorization techniques, or deep learning models for improved accuracy.
  3. User Feedback: Incorporate mechanisms for collecting user feedback on recommendations to continuously refine the recommendation engine.
  4. Personalization: Enhance personalization by considering additional user attributes such as demographics, viewing history, or genre preferences.
  5. A/B Testing: Conduct A/B testing to evaluate the effectiveness of different recommendation strategies and algorithms.

Conclusion:

The movie recommendation service project provides a scalable and efficient solution for generating personalized movie recommendations based on user preferences. By leveraging collaborative filtering techniques and Apache Spark's distributed computing capabilities, the system can handle large-scale datasets and deliver accurate recommendations to users, enhancing their movie-watching experience.

Case Report.pdf

Presentation.pptx

About

This project implements a movie recommendation service with Apache Spark using collaborative filtering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published