Skip to content

[UNMAINTAINED] A script (Scrapy spider) to check a list of URLs.

Notifications You must be signed in to change notification settings

TeamHG-Memex/linkrot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

linkrot

A script (Scrapy spider) to check a list of URLs. It requires Scrapy 1.1+

To run it periodically (e.g. every 2 hours) add something like this to crontab:

0 */2 * * * /usr/bin/python3 /home/ubuntu/linkrot.py /home/ubuntu/urls.txt /home/ubuntu/status.jl

To analyze the results check Link Status notebook.

If soft404 package is installed the result will contain probability of a page being 404, in addition to the returned status code and other info.

About

[UNMAINTAINED] A script (Scrapy spider) to check a list of URLs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published