This repository is a training project to develop a web scraping that extracts data from the page books.toscrape.com.
Which is a website created for that purpose.
Clone the repository
git clone https://github.com/ViniciusLucchesi/book-scraper.git
Compiles the project with all its dependencies defined in the Cargo.toml file and Run it afterwards
cargo run
This crate is an HTML parsing and query with CSS selectors that allow us to extract some information from the HTML passed to it as a parameter.
Scraper provides an interface to Servo’s html5ever and selectors crates, for browser-grade parsing and querying.
Its resources were used in both the models.rs
and main.rs
files, allowing the structuring of a new type called ModelSelector as well as the creation of the selectors necessary for the project, respectively.
// src/models.rs
use scraper::Selector;
pub struct ModelSelector {
pub book: Selector,
pub book_name: Selector,
pub book_price: Selector,
pub book_link: Selector
}
// src/main.rs
fn create_selectors() -> ModelSelector {
let book_selector: Selector = Selector::parse("article.product_pod").unwrap();
let book_name_selector: Selector = Selector::parse("h3 a").unwrap();
let book_price_selector: Selector = Selector::parse(".price_color").unwrap();
let book_link_selector: Selector = Selector::parse("h3 a").unwrap();
ModelSelector {
book: book_selector,
book_name: book_name_selector,
book_price: book_price_selector,
book_link: book_link_selector
}
}
- Explain reqwest crate
- Explain csv crate