Releases: lorey/mlscraper
Releases · lorey/mlscraper
1.0.0rc3
- improved training performance by 10x (again) by trying to generate scrapers for highly similar matches first
- added first pseudo css selectors by implementing nth-child. e.g.
div a:nth-child(1)
- added child selector generation, e.g.
.user-box > a
- added attribute-based css selectors, e.g.
a[itemprop="user"]
- added automated tests for GitHub profile pages
- added lazy hashing for node elements
- extended text matching to also include parent elements that contain the same text
- fixed a bug where searching for values resulted in image dimensions being matched
- fixed a bug where text did not exactly match the sample provided but was selected anyway
1.0.0rc2
1.0.0rc1
mlscraper
has been rewritten from the core and is now easier to use, more flexible, and faster than ever. This is the first release candidate for the upcoming 1.0 version. Feel free to try it out with pip install --pre mlscraper
.
- scrapers can extract arbitrary data structures (lists, dicts, lists of dicts and even lists of lists)
- depending on the page, one example might be enough to train a scraper
- the generation of CSS selectors has been overhauled and is now more efficient