Skip to content

John Snow Labs Spark-NLP 3.4.2: DeBERTa embeddings, new caching in Word2Vec and Doc2Vec, new state-of-the-art models, and bug fixes!

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 10 Mar 15:33
· 1970 commits to master since this release

Overview

We are pleased to release Spark NLP πŸš€ 3.4.2! This release comes with a new DeBERTa transformer for word embeddings, new caching to speed up training Word2Vec and Doc2Vec, new English and multi-lingual state-of-the-art models, and bug fixes!

As always, we would like to thank our community for their feedback, questions, and feature requests.


New Features

  • Introducing DeBertaEmbeddings annotator. DeBERTa (Decoding-enhanced BERT with disentangled attention) improves the BERT and RoBERTa models using two novel techniques. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). This annotator is compatible with all the models trained/fine-tuned by using DebertaV2Model for PyTorch or TFDebertaV2Model for TensorFlow models (DeBERTa-v2 & DeBERTa-v3) in HuggingFace
  • Introducing a new param enableCaching in Doc2VecApproach to speed up the training
  • Introducing a new param enableCaching in Word2VecApproach to speed up the training
  • Support Databricks runtime 10.3, 10.3 ML, and 10.3 ML & GPU
  • Support EMR emr-5.34.0 and emr-6.5.0

Bug Fixes

  • Fix bestModelMetric param when the set value was ignored #6978

New Notebooks

Import DeBERTa models to Spark NLP πŸš€

Spark NLP HuggingFace Notebooks Colab
DeBertaEmbeddings HuggingFace in Spark NLP - DeBERTa Open In Colab

You can visit Import Transformers in Spark NLP for more info


Models

New state-of-the-art DeBERTa models:

Model Name Lang
DeBertaEmbeddings deberta_v3_xsmall en
DeBertaEmbeddings deberta_v3_small en
DeBertaEmbeddings deberta_v3_base en
DeBertaEmbeddings deberta_v3_large en
DeBertaEmbeddings mdeberta_v3_base xx

Documentation


Installation

Python

#PyPI

pip install spark-nlp==3.4.2

Spark Packages

spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.4.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.4.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.4.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.4.2

spark-nlp on Apache Spark 3.2.x (Scala 2.12 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark32_2.12:3.4.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark32_2.12:3.4.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark32_2.12:3.4.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark32_2.12:3.4.2

spark-nlp on Apache Spark 2.4.x (Scala 2.11 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.4.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.4.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.4.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.4.2

spark-nlp on Apache Spark 2.3.x (Scala 2.11 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.4.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.4.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:3.4.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.4.2

Maven

spark-nlp on Apache Spark 3.0.x and 3.1.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>3.4.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>3.4.2</version>
</dependency>

spark-nlp on Apache Spark 3.2.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark32_2.12</artifactId>
    <version>3.4.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark32_2.12</artifactId>
    <version>3.4.2</version>
</dependency>

spark-nlp on Apache Spark 2.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark24_2.11</artifactId>
    <version>3.4.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark24_2.11</artifactId>
    <version>3.4.2</version>
</dependency>

spark-nlp on Apache Spark 2.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark23_2.11</artifactId>
    <version>3.4.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
    <version>3.4.2</version>
</dependency>

FAT JARs

What's Changed

Full Changelog: 3.4.1...3.4.2

New Contributors

@agsfer @KshitizGIT @gadde5300 @kolia1985 @jsl-models @rpranab @josejuanmartinez @bunyamin-polat @maziyarpanahi @jsl-builder @Damla-Gurbaz @xusliebana @mahmoodbayeshi @luca-martial @dependabot @muhammetsnts @albertoandreottiATgmai