Spark NLP 5.0.2: Introducing ONNX Support for ALBERT, CmameBERT, and XLM-RoBERTa, a new Zero-Short Classifier for XLM-RoBERTa transformer, 200+ new ONNX models, and bug fixes!
π’ Overview
Spark NLP 5.0.2 π comes with new ONNX support for ALBERT, CmameBERT, and XLM-RoBERTa annotators, a new Zero-Short Classifier for XLM-RoBERTa transformer, 200+ new ONNX models, and bug fixes! We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 18,000+ free and truly open-source models & pipelines. π
π₯ New Features
- NEW: Introducing support for ONNX Runtime in
ALBERT
,CamemBERT
, andXLM-RoBERTa
annotators. We have already converted 200+ models to ONNX format for these annotators for our community - NEW: Implement
XlmRoBertaForZeroShotClassification
annotator for Zero-Shot multi-class & multi-label text classification based onXLM-RoBERTa
transformer
π Bug Fixes & Enhancements
- Fix MarianTransformers annotator breaking with
java.lang.ClassCastException
in Python - Fix out of 0.0/1.0 accuracy in SentenceDetectorDL and MultiClassifierDL annotators
- Fix BART issue with a low-temperature value that only occurred when there are no non-infinite logits satisfying the low temperature and top_k values
- Add missing
E5Embeddings
andInstructorEmbeddings
annotators toannotators
in Scala for easy all-in-one import
π Documentation
- Import models from TF Hub & HuggingFace
- Spark NLP Notebooks
- Models Hub with new models
- Spark NLP Articles
- Spark NLP in Action
- Spark NLP Documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
β€οΈ Community support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium Spark NLP articles
- JohnSnowLabs official Medium
- YouTube Spark NLP video tutorials
Installation
Python
#PyPI
pip install spark-nlp==5.0.2
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.0.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.0.2
Apple Silicon (M1 & M2)
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.0.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.0.2
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.0.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.0.2
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>5.0.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>5.0.2</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>5.0.2</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>5.0.2</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.0.2.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.0.2.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.0.2.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.0.2.jar
What's Changed
- SPARKNLP-738 Enforcing accuracy to 0 and 1 in classifiers by @danilojsl in #13901
- Introducing a new Zero-Short Classifier for XLM-RoBERTa transformer by @ahmedlone127 in #13902
- Add support for ONNX to ALBERT, CamemBERT, and XLM-RoBERTa by @maziyarpanahi in #13907
- SPARKNLP-873 Issue with MarianTransformers models by @danilojsl in #13908
- BART Bug fix #13898 by @prabod in #13911
- Models hub by @maziyarpanahi in #13913
- release/502-release-candidate by @maziyarpanahi in #13912
Full Changelog: 5.0.1...5.0.2