Scrapping programming books data and stock it in json files separated and ordere by categories and sub categories
The scripts and files in this repo:
main_script.js
: scrap the data and store it under sub-directories into the Assets directory
process_scraped_data.js
: process the scraped data by reformattinf and adding new insights
AllCategoriesLinks.json
: this file contains all the URLs where we will scrapp data
each json file conatain list of books and each books has 5 keys :
"title"
: the title of book
"picture"
: small square images of cover
"source"
: the source page (wher we scrapped the data of the book)
"link"
: a direct link to download book in pdf version
"large_pic"
: the cover image of book (good quality)
- First make sure you have Node js installed and run these commands:
npm install
node main_scrap.js
node process_scraped_data.js
- Alternatively just run this command:
npm run start