Skip to content

Python3 + scrapy project, scrawl a target site as a static html & resourse set.

Notifications You must be signed in to change notification settings

EaseCloud/xmirror

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOMAIN=example.com SITE_ROOT=http://$DOMAIN

scrapy crawl site
-s DOMAIN=$DOMAIN
-s START_URLS=$SITE_ROOT,$SITE_ROOT/robots.txt,$SITE_ROOT/sitemap.xml
-s DIR_ROOT=/var/www/static/$SITE_ROOT
-s USER_AGENT='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36' # USER_AGENT='Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.23 Mobile Safari/537.36',

About

Python3 + scrapy project, scrawl a target site as a static html & resourse set.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published