Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 582 Bytes

README.md

File metadata and controls

17 lines (10 loc) · 582 Bytes

WebArchiver

Will download HTML of URL provided at runtime, will also parse html extract embedded links and download associated html.

  • requests library used to GET HTML and parse embeded links
  • Will normalise a url by removing and non alphanumeric chars and replace with _ also remove http:// eg. https://python.org/ => python_org_
  • Save main url and embeded urls as .html and produce lookup.json to reconstruct urls from normalised filenames.