John Snow Labs Spark-NLP 2.5.0: ALBERT & XLNet transformers, state-of-the-art spell checker, multi-class sentiment detector, 80+ new models & pipelines in 14 new languages & more
Overview
When we started planning for Spark NLP 2.5.0 release a few months ago the world was a different place!
We have been blown away by the use of Natural Language Processing for early outbreak detections, question-answering chatbot services, text analysis of medical records, monitoring efforts to minimize the virus spread, and many more.
In that spirit, we are honored to announce Spark NLP 2.5.0 release! Witnessing the world coming together to fight coronavirus has driven us to deliver perhaps one of the biggest releases we have ever made.
As always, we thank our community for their feedback, bug reports, and contributions that made this release possible.
Major features and improvements
- NEW: A new AlbertEmbeddings annotator with 4 available pre-trained models
- NEW: A new XlnetEmbeddings annotator with 2 available pre-trained models
- NEW: A new ContextSpellChecker annotator, the state-of-the-art annotator for spell checking
- NEW: A new SentimentDL annotator for multi-class sentiment analysis. This annotator comes with 2 available pre-trained models trained on IMDB and Twitter datasets
- NEW: Support for 14 new languages with 80+ pretrained models and pipelines!
- Add new PubTator reader to convert automatic annotations of the biomedical datasets into DataFrame
- Introducing a new outputLogsPath param for NerDLApproach, ClassifierDLApproach and SentimentDLApproach annotators
- Refactored CoNLLGenerator to actually use NER labels from the DataFrame
- Unified params in NerDLModel in both Scala and Python
- Extend and complete Scaladoc APIs for all the annotators
Bugfixes
- Fix position of tokens in Normalizer
- Fix Lemmatizer exception on a bad input
- Fix annotator logs failing on object storage file systems like DBFS
Models and Pipelines
Spark NLP 2.5.0
comes with 87 new pretrained models and pipelines in 14 new languages available for all Windows, Linux, and macOS users. We added new languages such as Dutch, Norwegian. Polish, Portuguese, Bulgarian, Czech, Greek, Finnish, Hungarian, Romanian, Slovak, Swedish, Turkish, and Ukrainian.
The complete list of 160+ models & pipelines in 22+ languages is available here.
Featured Pretrained Pipelines
Dutch - Pipelines
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.5.0 | nl |
Download | |
Explain Document Medium | explain_document_md |
2.5.0 | nl |
Download | |
Explain Document Large | explain_document_lg |
2.5.0 | nl |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.5.0 | nl |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.5.0 | nl |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.5.0 | nl |
Download |
Norwegian - Pipelines
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.5.0 | no |
Download | |
Explain Document Medium | explain_document_md |
2.5.0 | no |
Download | |
Explain Document Large | explain_document_lg |
2.5.0 | no |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.5.0 | no |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.5.0 | no |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.5.0 | no |
Download |
Polish - Pipelines
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.5.0 | pl |
Download | |
Explain Document Medium | explain_document_md |
2.5.0 | pl |
Download | |
Explain Document Large | explain_document_lg |
2.5.0 | pl |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.5.0 | pl |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.5.0 | pl |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.5.0 | pl |
Download |
Portuguese - Pipelines
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.5.0 | pt |
Download | |
Explain Document Medium | explain_document_md |
2.5.0 | pt |
Download | |
Explain Document Large | explain_document_lg |
2.5.0 | pt |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.5.0 | pt |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.5.0 | pt |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.5.0 | pt |
Download |
Documentation
- Update documentation for release of Spark NLP 2.5.0
- Update the entire spark-nlp-workshop notebooks for Spark NLP 2.5.0
- Update the entire spark-nlp-models repository with new pre-trained models and pipelines
Installation
Python
#PyPI
pip install spark-nlp==2.5.0
#Conda
conda install -c johnsnowlabs spark-nlp==2.5.0
Spark
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.0
PySpark
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.0
Maven
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.5.0</version>
</dependency>
FAT JARs