This package allows you to extract a difference between two html pages:
given pages A and B, it will try to extract parts of A that are changed in B.
It uses lxml.html.diff
under the hood. but provides only changed parts as HTML.
It requires Python 3 currently.
License is MIT.
You can install the package from PyPI:
pip install extract-html-diff
You can extract diff as text:
import extract_html_diff html = '<div> <h1>My site</h1> <div>My content</div> </div>' other_html = '<div> <h1>My site</h1> <div>Other content</div> </div>' extract_html_diff.as_string(html, other_html)
this will give you:
'<div><div>My content</div> </div>'
You can also get diff as a tree (an lxml.html.HtmlElement
) if
you plan to do additional transformations or change serialization:
extract_html_diff.as_tree(html, other_html)
You can pass input html as str
or bytes
(it will be parsed with lxml.html.fromstring
in this case), or as an already parsed
lxml.html.HtmlElement
.