This Python-based email scraper extracts email addresses from websites listed in a CSV file. It uses the Playwright library to interact with web pages and locate email addresses on both the main page and designated contact pages.
- Scrapes email addresses from the main page of a website.
- Follows and scrapes contact pages linked from the main page.
- Converts relative URLs to absolute URLs.
- Logs errors to
error_log.txt
for troubleshooting. - Saves results to a new CSV file with an additional 'Email' column.
- Python 3.10
- Google Chrome or Chromium browser
-
Clone the repository:
git clone https://github.com/r7avi/Scrape-Emails-from-Websites.git cd email-scraper
-
Set up a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install required packages:
pip install -r requirements.txt
-
Download and install Google Chrome or Chromium:
Ensure you have Google Chrome or Chromium installed. The script uses Playwright's Chromium driver by default but you can specify the path to Chrome if needed.
-
Run the script:
python email-scrapper.py
-
Select a CSV file:
A file dialog will prompt you to select a CSV file containing website URLs. Ensure the CSV file has a column named Website.
-
Review results:
The script will process each website, extract email addresses, and save the results to a new CSV file with an added Email column.
Errors encountered during the scraping process are logged in error_log.txt. Review this file for troubleshooting any issues.