Release John Snow Labs Spark-NLP 2.1.0-rc1: Tokenizer revamped, NerDLApproach metrics and eval module · JohnSnowLabs/spark-nlp

This is a pre-release for 2.1.0. The tokenizer has been revamped, and some of the DocumentAssembler defaults changed.
For this reason, many pipelines and models may now change their accuracies and performance. Old tokenizer default rules
will be translated in a new english specific pretrained Tokenizer.
NerDLApproach will now report metrics if setTrainValidationProp has been set, as well as confidence scores reporting in spell checkers.
DependencyParser output has been reviewed and fixed a bunch of other bugs in the embeddings scope.
Please feedback and bugs, and remember, this is a pre-release, so not yet intended for production use.
Join Slack!

Enhancements

Norvig and Symmetric spell checkers now report confidence scores in metadata
Tokenizer has been severely enhanced to allow easier and faster customization
NerDLApproach now reports metrics and f1 scores with an automated dataset splitting through setTrainValidationProp
Began making progress towards OCR reporting more meaningful metadata (noise levels, confidence score, etc), sets ground base for further development
Added spark-nlp-eval evaluation model with multiple scripts that help users evaluate their models and pipelines. To be improved.

Bugfixes

Fixed Dependency Parser not reporting offsets correctly
Dependency Parser now only shows head token as part of the result, instead of pairs
Fixed NerDLModel not allowing to pick noncontrib versions from linux
Fixed a bug in embeddingsRef validation allowing the user to override ref when not possible

Framework

ResourceDownloader now capable of utilizing credentials from aws standard means (variables, credentials folder)

Documentation

Added Google Colab workthrough guide
Added Approach and Model class names in reference documentation
Fixed various typos and outdated pieces in documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

John Snow Labs Spark-NLP 2.1.0-rc1: Tokenizer revamped, NerDLApproach metrics and eval module

Enhancements

Bugfixes

Framework

Documentation