Company URL Finder

Overview

Company URL Finder is a robust Python application designed to help you efficiently search and extract company website URLs using multiple strategies. The project provides two main search approaches:

Selenium Web Scraping: Uses Selenium WebDriver to perform direct Google searches
Google Custom Search API: Leverages Google's official search API for precise URL retrieval

Key Features

Parallel processing of company searches
Multiple search strategies
Adaptive URL ranking algorithm
Error handling and logging
Flexible configuration options

Prerequisites

System Requirements

Python 3.8+
Chrome Browser (for Selenium)
ChromeDriver

Dependencies

Install the required dependencies using pip:

pip install -r requirements.txt

Environment Setup

Create a .env file in the project root

Add the following environment variables:

GOOGLE_CUSTOM_SEARCH_API_KEY=your_google_api_key
CUSTOM_SEARCH_ENGINE_ID=your_custom_search_engine_id

Installation

Clone the repository:

https://github.com/XenosWarlocks/company-url-finder.git
cd company-url-finder

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```
Install ChromeDriver:
- Download compatible with your Chrome browser version
- Add to system PATH or specify in script

Usage

Input File Preparation

Prepare an Excel file (companies.xlsx) with a column named "Company Name" containing the list of companies you want to search.

Running the Application

python main.py

Search Strategy Options

Selenium Google Search (Option 1):
- Faster, web-scraping approach
- Parallel processing
- Suitable for smaller lists
Google Custom Search API (Option 2):
- More precise results
- Limited by API quota
- Better for comprehensive searches
Combined Strategy (Option 3):
- First uses Selenium
- Then validates/processes with API
- Most thorough but slower

Output Files

google_results.csv: Successful company URL matches
cant_find_urls.csv: Companies without URL matches
api_results.csv: Custom Search API results

Advanced Configuration

Selenium Searcher

Customize in selenium_searcher.py:

headless: Run browser invisibly
max_workers: Control parallel search threads

URL Ranking Parameters

Adjust in google_algo.py:

URL_COUNT_WEIGHT
URL_ORDER_WEIGHT
URL_LEN_WEIGHT

Extending the Project

Module Extensions

You can extend functionality by:

Creating custom URL matching algorithms
Adding more web scraping strategies
Implementing additional ranking methods

Example extension structure:

class CustomURLFinder:
    def __init__(self, parent_finder):
        self.parent = parent_finder
    
    def custom_url_matching_method(self, company, urls):
        # Implement custom logic
        pass

Troubleshooting

Ensure ChromeDriver matches your Chrome version
Check API key and Search Engine ID
Verify input file format
Monitor API usage quotas

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
app		app
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Company URL Finder

Overview

Key Features

Prerequisites

System Requirements

Dependencies

Environment Setup

Installation

Usage

Input File Preparation

Running the Application

Search Strategy Options

Output Files

Advanced Configuration

Selenium Searcher

URL Ranking Parameters

Extending the Project

Module Extensions

Troubleshooting

Contributing

About

Releases

Packages

Contributors 2

Languages

License

XenosWarlocks/company-url-finder

Folders and files

Latest commit

History

Repository files navigation

Company URL Finder

Overview

Key Features

Prerequisites

System Requirements

Dependencies

Environment Setup

Installation

Usage

Input File Preparation

Running the Application

Search Strategy Options

Output Files

Advanced Configuration

Selenium Searcher

URL Ranking Parameters

Extending the Project

Module Extensions

Troubleshooting

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages