⭐ GitHub Recommender System

Distributed Recommender System for GitHub repositories based on the implicit feedback captured when a user stars a repository.

The system relies on a dataset containing information about user interactions on the GitHub website. In order to collect it, we started from a well-known repository like React and we stored all the users that starred that repo. Then, for each user we stored what other repositories he has starred.

In particular, for each repository the following data is avalable:

Repository name
Repository owner
Number of stars
Number of forks
Main language
About (small description of the repo)
Sponsor (true/false - some repo have the possibility to open donations)
Last update time

Regarding users, instead, we collected data about which user starred which repository.

For the recommender system, we experimented with the Content-based filtering approach and we tried the Collaborative Filtering approaches, implementing both Neighbourhood CF and the Matrix Factorization approach.

Finally, in order to evaluate our work, we used the following metrics: MAP@K and Personalization.

Contributors

Notebook

In this repo is stored the notebook used for the creation and validation of the recommender system, named GitHub_Recommender_System.ipynb.

In order to run the notebook, Colab is suggested since the project is made using the pyspark library.

Results

In the following table the results obtained by the models are reported:

Model	MAP@1	MAP@2	MAP@3	MAP@4	MAP@5	Personalization
Content-based Filtering	0.067	0.051	0.043	0.037	0.033	0.676
User-based Collaborative Filtering	0.357	0.261	0.213	0.183	0.164	0.965
Item-Based Collaborative Filtering	0.355	0.300	0.264	0.239	0.216	0.874
Matrix Factorization	0.506	0.393	0.332	0.298	0.268	0.864

We can see that as expected the Matrix Factorization model is the best model, having a higher MAP value for all the ranks.

Demo

Using the notebook as backend we realized a small demo which gives the possibility to inspect the recommendations for all the users in the dataset. Moreover, it allows to define a fake user, selecting the repositories that he starred in order to test the system.

The frontend is available at the following link https://recommend-hub.netlify.app/, but unfortunately the backend is not hosted anymore. However, it is possible to host it yourself directly in Colab, following the instructions provided at the end of the notebook.

Home page

Prediction results

Technologies

The system is implemented using the Python language and the following libraries:

pyspark as the distributed framework
pandas to work with data
matplotlib and seaborn to perform the EDA

Moreover, for the Web server the following technologies have been used:

flask and ngrok Python libraries for the back-end
React.js for the front-end

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
demo		demo
scripts		scripts
.gitignore		.gitignore
GitHub Recommender System Presentation.pdf		GitHub Recommender System Presentation.pdf
GitHub_Recommender_System.ipynb		GitHub_Recommender_System.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⭐ GitHub Recommender System

Contributors

Notebook

Results

Demo

Home page

Prediction results

Technologies

About

Releases

Packages

Contributors 2

Languages

License

dotmat3/github-recommender-system

Folders and files

Latest commit

History

Repository files navigation

⭐ GitHub Recommender System

Contributors

Notebook

Results

Demo

Home page

Prediction results

Technologies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages