Skip to content

Scraper for business leads from Google knowledge panel in Python3

License

Notifications You must be signed in to change notification settings

jojorb/google-knowledge-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

code.png

Google Knowledge Panel Scraper GitHub

Retrieve available business leads from Google knowledge panel in Python3

gkps is very inspire by knowledge-panel-scraper a scraper in CLI for Google's Knowledge Panels

Highlights

  • scape with less false negatives
  • segment results
  • fancy prompt

Install

Use git to clone the repository, then install required libraries with the package manager pip.

requirements.txt generated by pipreqs

git clone https://github.com/RobyRemzy/google-knowledge-scraper.git
cd google-knowledge-scraper
pip install -r requirements.txt

Usage

python gkps.py inputfile.csv

inputfile.csv should be a plain text CSV file with each row containing data to generate a search query for a specific business. For example:

"Bobcat of Monroe,Monroe,NC",1711 MORGAN MILL ROAD,MONROE,NC,28110,(704) 289-2200
"Kelly's Garage,Perry,NY",2868 STATE ROUTE 246,PERRY,NY,14530,(585) 237-2504
"Hoxie Implement Co,Hoxie,KS",933 OAK AVENUE,HOXIE,KS,67740-0587,(785) 675-3201
"Duhon Machinery,St. Rose,LA",10460 WEST AIRLINE HIGHWAY,ST. ROSE,LA,70087,(504) 466-5495

demo.png

The script will try to fetch data on Google knowledge panel and if it fail it will try it again (as it can be successful this time!). If it fail for the second time it will jump to the next row.

  • Green => data has been saved
  • Cyan => data has been re fetch
  • Red => data has been re fetch but not sucessfully

When finished it will prompt you to tweak by hand failed queries on your default editor.

If gkps.py finish with successful response, files will be copied in a timed folder

  • results.csv contains all existing results
  • results_true.csv contains only successful responses
  • results_false.csv contains only failed responses

Generated files from the last commande are also in the root directory and will be overridden on next attempt.

After some tweaks (or not) you can re launch the party with this command until you cannot retrieve any good data.

python gkps.py results_false.csv

Contributing

Pull requests are welcome. Let's do this in Rust lang?

Maintainers

About

Scraper for business leads from Google knowledge panel in Python3

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages