John Snow Labs Spark-NLP 3.4.2: DeBERTa embeddings, new caching in Word2Vec and Doc2Vec, new state-of-the-art models, and bug fixes!
Overview
We are pleased to release Spark NLP π 3.4.2! This release comes with a new DeBERTa transformer for word embeddings, new caching to speed up training Word2Vec and Doc2Vec, new English and multi-lingual state-of-the-art models, and bug fixes!
As always, we would like to thank our community for their feedback, questions, and feature requests.
New Features
- Introducing DeBertaEmbeddings annotator. DeBERTa (Decoding-enhanced BERT with disentangled attention) improves the BERT and RoBERTa models using two novel techniques. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). This annotator is compatible with all the models trained/fine-tuned by using
DebertaV2Model
for PyTorch orTFDebertaV2Model
for TensorFlow models (DeBERTa-v2 & DeBERTa-v3) in HuggingFace - Introducing a new param
enableCaching
in Doc2VecApproach to speed up the training - Introducing a new param
enableCaching
in Word2VecApproach to speed up the training - Support Databricks runtime 10.3, 10.3 ML, and 10.3 ML & GPU
- Support EMR emr-5.34.0 and emr-6.5.0
Bug Fixes
- Fix bestModelMetric param when the set value was ignored #6978
New Notebooks
Import DeBERTa models to Spark NLP π
Spark NLP | HuggingFace Notebooks | Colab |
---|---|---|
DeBertaEmbeddings | HuggingFace in Spark NLP - DeBERTa |
You can visit Import Transformers in Spark NLP for more info
Models
New state-of-the-art DeBERTa models:
Model | Name | Lang |
---|---|---|
DeBertaEmbeddings | deberta_v3_xsmall | en |
DeBertaEmbeddings | deberta_v3_small | en |
DeBertaEmbeddings | deberta_v3_base | en |
DeBertaEmbeddings | deberta_v3_large | en |
DeBertaEmbeddings | mdeberta_v3_base | xx |
Documentation
- TF Hub & HuggingFace to Spark NLP
- Models Hub with new models
- Spark NLP documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
- Spark NLP Workshop notebooks
- Spark NLP publications
- Spark NLP in Action
- Spark NLP training certification notebooks for Google Colab and Databricks
- Spark NLP Display for visualization of different types of annotations
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
Installation
Python
#PyPI
pip install spark-nlp==3.4.2
Spark Packages
spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.4.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.4.2
spark-nlp on Apache Spark 3.2.x (Scala 2.12 only):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark32_2.12:3.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark32_2.12:3.4.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark32_2.12:3.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark32_2.12:3.4.2
spark-nlp on Apache Spark 2.4.x (Scala 2.11 only):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.4.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.4.2
spark-nlp on Apache Spark 2.3.x (Scala 2.11 only):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.4.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:3.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.4.2
Maven
spark-nlp on Apache Spark 3.0.x and 3.1.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>3.4.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>3.4.2</version>
</dependency>
spark-nlp on Apache Spark 3.2.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-spark32_2.12</artifactId>
<version>3.4.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu-spark32_2.12</artifactId>
<version>3.4.2</version>
</dependency>
spark-nlp on Apache Spark 2.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-spark24_2.11</artifactId>
<version>3.4.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu-spark24_2.11</artifactId>
<version>3.4.2</version>
</dependency>
spark-nlp on Apache Spark 2.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-spark23_2.11</artifactId>
<version>3.4.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
<version>3.4.2</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.0.x/3.1.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-3.4.2.jar
-
GPU on Apache Spark 3.0.x/3.1.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-3.4.2.jar
-
CPU on Apache Spark 3.2.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark32-assembly-3.4.2.jar
-
GPU on Apache Spark 3.2.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark32-assembly-3.4.2.jar
-
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark24-assembly-3.4.2.jar
-
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark24-assembly-3.4.2.jar
-
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-3.4.2.jar
-
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark23-assembly-3.4.2.jar
What's Changed
Full Changelog: 3.4.1...3.4.2
New Contributors
- @mahmoodbayeshi made their first contribution in #6835
- @bunyamin-polat made their first contribution in #6969
@agsfer @KshitizGIT @gadde5300 @kolia1985 @jsl-models @rpranab @josejuanmartinez @bunyamin-polat @maziyarpanahi @jsl-builder @Damla-Gurbaz @xusliebana @mahmoodbayeshi @luca-martial @dependabot @muhammetsnts @albertoandreottiATgmai