A powerful tool combining a Chrome Extension frontend with a Flask server backend to perform web crawling and summarization tasks efficiently. It uses technologies like BeautifulSoup, KeyBERT, Google Search API, and Hugging Face Transformers for web crawling and summarization.
- Acts as the frontend.
- Allows users to interact with the crawler and summarizer.
- Usage: Add this extension in Chrome by enabling Developer Mode in the Chrome extensions page and loading the unpacked folder.
- Backend server built with Flask.
- Handles web crawling and summarization requests through defined API endpoints.
- Core files:
server.py
: The main Flask application, which processes requests and orchestrates tasks like crawling and summarization.webCrawler.py
: Implements an asynchronous, relevance-based web crawler.
- Utilizes
BeautifulSoup
andaiohttp
for efficient web crawling. - Filters irrelevant content using cosine similarity via
TfidfVectorizer
. - Extracts meaningful content from visited web pages.
- Uses Hugging Face's Long T5 model for text summarization.
- Processes large content efficiently by dividing it into manageable chunks and implementing multiprocessing.
- Extracts top keywords using KeyBERT, aiding in refining search queries.
git clone https://github.com/<your-username>/smart-crawler-summarizer.git
cd smart-crawler-summarizer
cd python-server
pip install -r requirements.txt
cd python-server
python server.py
The server will run on http://127.0.0.1:5000
.
-
Set up the Chrome Extension:
- Go to Chrome settings → Extensions → Enable Developer Mode.
- Click "Load unpacked" and select the
chrome-extension
folder.
-
Run the Server:
- Start the Flask server as explained in the Installation section.
-
Use Extension:
- Open any article link, click on the extension, and click on 'Summarize Article' button or 'Get Relevant Links' or 'Generate Both'.
- Backend: Flask, aiohttp, BeautifulSoup
- Frontend: Chrome Extension
- Libraries: Hugging Face Transformers, KeyBERT, NLTK
- Crawler: Asynchronous web crawling with relevance filtering