Scraping Quickstart

This repository serves as a comprehensive introduction to web scraping techniques, suitable for beginners and intermediate Python users. It provides a hands-on learning experience through a series of Jupyter Notebooks, guiding you through various scraping methods.

Getting Started

Clone the repository to your local machine by running this script on your terminal:

git clone https://github.com/tan-yong-sheng/scraping-quickstart

Run cd scraping-quickstart on your terminal
Rename the .env.sample file to .env.
In the .env file, add the required credentials for the following services:
- 2Captcha API key: Obtain your 2Captcha API key from https://2captcha.com/2captcha-api. This paid service helps you to bypass recaptcha.
- Brightdata's proxy username, password, and hostname: Obtain your proxy at https://brightdata.com/ to rotate your IP address and reduce the possibility that you're detected as a scraping bot.

Learning Resources

For a more in-depth understanding of web scraping techniques, refer to the accompanying blog: https://tanyongsheng.com/blog/web-scraping-tutorial-with-python-2024-using-python-requests-beautifulsoup-selenium-anti-captcha-and-proxy/. This blog provides detailed explanations, code samples, and additional resources to complement the Jupyter Notebooks.

Contributing

Contributions to this repository are welcome! If you have any suggestions, improvements, or additional examples, please feel free to open a pull request or submit an issue.

Happy scraping!

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
.github/workflows		.github/workflows
1_AJAX_API		1_AJAX_API
2_static_website		2_static_website
3_dynamic_website		3_dynamic_website
4_login_website		4_login_website
5_solve_recaptcha/google_recaptcha_v2		5_solve_recaptcha/google_recaptcha_v2
5b_prevent_tls_fingerprinting		5b_prevent_tls_fingerprinting
6_avoid_rate_limit		6_avoid_rate_limit
7_scraping_non_structured_data		7_scraping_non_structured_data
8_parsing_and_validating_scraped_data		8_parsing_and_validating_scraped_data
9_speed_up_web_scraper_with_concurrency		9_speed_up_web_scraper_with_concurrency
assets/static		assets/static
docs		docs
work_in_progress		work_in_progress
.env.sample		.env.sample
.gitignore		.gitignore
.gitpod.Dockerfile		.gitpod.Dockerfile
.gitpod.yml		.gitpod.yml
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping Quickstart

Getting Started

Table of Contents

Learning Resources

Contributing

About

Releases

Packages

Contributors 2

Languages

tan-yong-sheng/scraping_quickstart

Folders and files

Latest commit

History

Repository files navigation

Scraping Quickstart

Getting Started

Table of Contents

Learning Resources

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages