Skip to content

John Snow Labs Spark-NLP 2.5.0: ALBERT & XLNet transformers, state-of-the-art spell checker, multi-class sentiment detector, 80+ new models & pipelines in 14 new languages & more

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 10 May 22:15
· 5033 commits to master since this release

Overview

When we started planning for Spark NLP 2.5.0 release a few months ago the world was a different place!

We have been blown away by the use of Natural Language Processing for early outbreak detections, question-answering chatbot services, text analysis of medical records, monitoring efforts to minimize the virus spread, and many more.

In that spirit, we are honored to announce Spark NLP 2.5.0 release! Witnessing the world coming together to fight coronavirus has driven us to deliver perhaps one of the biggest releases we have ever made.

As always, we thank our community for their feedback, bug reports, and contributions that made this release possible.


Major features and improvements

  • NEW: A new AlbertEmbeddings annotator with 4 available pre-trained models
  • NEW: A new XlnetEmbeddings annotator with 2 available pre-trained models
  • NEW: A new ContextSpellChecker annotator, the state-of-the-art annotator for spell checking
  • NEW: A new SentimentDL annotator for multi-class sentiment analysis. This annotator comes with 2 available pre-trained models trained on IMDB and Twitter datasets
  • NEW: Support for 14 new languages with 80+ pretrained models and pipelines!
  • Add new PubTator reader to convert automatic annotations of the biomedical datasets into DataFrame
  • Introducing a new outputLogsPath param for NerDLApproach, ClassifierDLApproach and SentimentDLApproach annotators
  • Refactored CoNLLGenerator to actually use NER labels from the DataFrame
  • Unified params in NerDLModel in both Scala and Python
  • Extend and complete Scaladoc APIs for all the annotators

Bugfixes

  • Fix position of tokens in Normalizer
  • Fix Lemmatizer exception on a bad input
  • Fix annotator logs failing on object storage file systems like DBFS

Models and Pipelines

Spark NLP 2.5.0 comes with 87 new pretrained models and pipelines in 14 new languages available for all Windows, Linux, and macOS users. We added new languages such as Dutch, Norwegian. Polish, Portuguese, Bulgarian, Czech, Greek, Finnish, Hungarian, Romanian, Slovak, Swedish, Turkish, and Ukrainian.

The complete list of 160+ models & pipelines in 22+ languages is available here.

Featured Pretrained Pipelines

Dutch - Pipelines

Pipeline Name Build lang Description Offline
Explain Document Small explain_document_sm 2.5.0 nl Download
Explain Document Medium explain_document_md 2.5.0 nl Download
Explain Document Large explain_document_lg 2.5.0 nl Download
Entity Recognizer Small entity_recognizer_sm 2.5.0 nl Download
Entity Recognizer Medium entity_recognizer_md 2.5.0 nl Download
Entity Recognizer Large entity_recognizer_lg 2.5.0 nl Download

Norwegian - Pipelines

Pipeline Name Build lang Description Offline
Explain Document Small explain_document_sm 2.5.0 no Download
Explain Document Medium explain_document_md 2.5.0 no Download
Explain Document Large explain_document_lg 2.5.0 no Download
Entity Recognizer Small entity_recognizer_sm 2.5.0 no Download
Entity Recognizer Medium entity_recognizer_md 2.5.0 no Download
Entity Recognizer Large entity_recognizer_lg 2.5.0 no Download

Polish - Pipelines

Pipeline Name Build lang Description Offline
Explain Document Small explain_document_sm 2.5.0 pl Download
Explain Document Medium explain_document_md 2.5.0 pl Download
Explain Document Large explain_document_lg 2.5.0 pl Download
Entity Recognizer Small entity_recognizer_sm 2.5.0 pl Download
Entity Recognizer Medium entity_recognizer_md 2.5.0 pl Download
Entity Recognizer Large entity_recognizer_lg 2.5.0 pl Download

Portuguese - Pipelines

Pipeline Name Build lang Description Offline
Explain Document Small explain_document_sm 2.5.0 pt Download
Explain Document Medium explain_document_md 2.5.0 pt Download
Explain Document Large explain_document_lg 2.5.0 pt Download
Entity Recognizer Small entity_recognizer_sm 2.5.0 pt Download
Entity Recognizer Medium entity_recognizer_md 2.5.0 pt Download
Entity Recognizer Large entity_recognizer_lg 2.5.0 pt Download

Documentation

  • Update documentation for release of Spark NLP 2.5.0
  • Update the entire spark-nlp-workshop notebooks for Spark NLP 2.5.0
  • Update the entire spark-nlp-models repository with new pre-trained models and pipelines

Installation

Python

#PyPI

pip install spark-nlp==2.5.0

#Conda

conda install -c johnsnowlabs spark-nlp==2.5.0

Spark

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.0

PySpark

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.0

Maven

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.11</artifactId>
    <version>2.5.0</version>
</dependency>

FAT JARs