This is a web scraping project which scraps the github website to get all issues from top 8 repositories present inside each of the given 3 topics in this page (these 3 topics are randomly listed each time we refresh this page) The following activities are carried out when we run this project-
- Separate directories are created for each of these 3 topics.
- Different pdf files are created, one for each repository inside the topic directory
- In each of these pdf files, the links for all issues of the particular repository are listed.
- Clone this repository in your local environment.
- Run command
npm install
to install all the required packages. - Run command
node main.js
to get all the required directories and files.
- Cheerio module used here for web scraping.
- Limitation: cheerio module only parses and extracts initial loaded html. Since all repositories are not loaded at once, we are extracting issues only from top 8 repositories.