Supermicro webcrawler

This Python web crawler is designed to meticulously traverse the Supermicro website by using BFS, systematically indexing and categorizing all PDF and text files it encounters. By leveraging the Selenium library, it navigates through the site, identifying relevant files such as case studies, datasheets, brochures, white papers, guides, and miscellaneous documents. The crawler intelligently organizes these files into logical groupings, ensuring that every valuable resource on the Supermicro website is efficiently cataloged for easy access and reference.

Installation

install chrome driver

please use below link to install chrome driver under the project root folder https://chromedriver.chromium.org/downloads

create venv

# create venv
$ python3 -m venv venv
# start venv
$ source ./venv/bin/activate

install requirement

$ pip install -r requirements.txt

Usage

initiate supermicro crawler, the file will be saved under ./file/

$ python3 main.py

initiate file categorizing

$ python3 categorize.py

Future work

Some datasheets don't have the ".pdf" extension in the URL, as shown in the "datasheet_tag_problem.png" image. We need to identify and handle these special cases to ensure proper file handling and processing.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
.gitignore		.gitignore
README.md		README.md
categorize.py		categorize.py
datasheet_tag_problem.png		datasheet_tag_problem.png
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supermicro webcrawler

Installation

install chrome driver

create venv

install requirement

Usage

initiate supermicro crawler, the file will be saved under ./file/

initiate file categorizing

Future work

About

Releases

Packages

Languages

SuBoYu/Supermicro_webcrawler

Folders and files

Latest commit

History

Repository files navigation

Supermicro webcrawler

Installation

install chrome driver

create venv

install requirement

Usage

initiate supermicro crawler, the file will be saved under ./file/

initiate file categorizing

Future work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages