Skip to content

This repository is a training project to develop a web scraping that extracts data from the page https://books.toscrape.com

Notifications You must be signed in to change notification settings

ViniciusLucchesi/book-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Book Scraper

programming_language crate_scraper crate_reqwest crate_csv

This repository is a training project to develop a web scraping that extracts data from the page books.toscrape.com.

Which is a website created for that purpose.

Getting startedScraper crateDocumentation Roadmap

Getting started

Clone the repository

git clone https://github.com/ViniciusLucchesi/book-scraper.git

Compiles the project with all its dependencies defined in the Cargo.toml file and Run it afterwards

cargo run

Scraper crate

This crate is an HTML parsing and query with CSS selectors that allow us to extract some information from the HTML passed to it as a parameter.

Scraper provides an interface to Servo’s html5ever and selectors crates, for browser-grade parsing and querying.

Its resources were used in both the models.rs and main.rs files, allowing the structuring of a new type called ModelSelector as well as the creation of the selectors necessary for the project, respectively.

// src/models.rs
use scraper::Selector;

pub struct ModelSelector {
    pub book: Selector,
    pub book_name: Selector,
    pub book_price: Selector,
    pub book_link: Selector
}
// src/main.rs
fn create_selectors() -> ModelSelector {
    let book_selector: Selector = Selector::parse("article.product_pod").unwrap();
    let book_name_selector: Selector = Selector::parse("h3 a").unwrap();
    let book_price_selector: Selector = Selector::parse(".price_color").unwrap();
    let book_link_selector: Selector = Selector::parse("h3 a").unwrap();
    
    ModelSelector { 
        book: book_selector,
        book_name: book_name_selector,
        book_price: book_price_selector,
        book_link: book_link_selector
    }   
}

Documentation Roadmap

  • Explain reqwest crate
  • Explain csv crate

About

This repository is a training project to develop a web scraping that extracts data from the page https://books.toscrape.com

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages