Spark NLP 5.1.1: Introducing ONNX Support for MPNet, AlbertForTokenClassification, AlbertForSequenceClassification, AlbertForQuestionAnswering transformers, access to full vectors in Word2VecModel, Doc2VecModel, WordEmbeddingsModel annotators, 460+ new ONNX models, and bug fixes!
π’ Overview
Spark NLP 5.1.1 π comes with new ONNX support for MPNet
, AlbertForTokenClassification
, AlbertForSequenceClassification
, and AlbertForQuestionAnswering
annotators, a new getVectors
feature in Word2VecModel
, Doc2VecModel
, and WordEmbeddingsModel
annotators, 460+ new ONNX models for MPNet and BERT transformers, and bug fixes!
We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 18,800+ free and truly open-source models & pipelines. π
π₯ New Features & Enhancements
- NEW: Introducing support for ONNX Runtime in
MPNet
embedding annotator - NEW: Introducing support for ONNX Runtime in
AlbertForTokenClassification
annotator - NEW: Introducing support for ONNX Runtime in
AlbertForSequenceClassification
annotator - NEW: Introducing support for ONNX Runtime in
AlbertForQuestionAnswering
annotator - Implement
getVectors
feature inWord2VecModel
,Doc2VecModel
, andWordEmbeddingsModel
annotators. This new feature allows access to the entire tokens and their vectors from the loaded models.
π Bug Fixes
- Fix how to save and load
Whisper
models - Fix saving ONNX model on Windows operating system
π Documentation
- Import models from TF Hub & HuggingFace
- Spark NLP Notebooks
- Models Hub with new models
- Spark NLP Articles
- Spark NLP in Action
- Spark NLP Documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
β€οΈ Community support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium Spark NLP articles
- JohnSnowLabs official Medium
- YouTube Spark NLP video tutorials
Installation
Python
#PyPI
pip install spark-nlp==5.1.1
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.1
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.1
Apple Silicon (M1 & M2)
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.1
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.1
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>5.1.1</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>5.1.1</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>5.1.1</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>5.1.1</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.1.1.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.1.1.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.1.1.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.1.1.jar
What's Changed
- fixed e5 modelhub card code sections by @ahmedlone127 in #13950
- fixing modelhub cards by @ahmedlone127 in #13952
- Models hub by @maziyarpanahi in #13943
- [SPARKNLP-906] Fix reading suffix by @DevinTDHa in #13945
- Sparknlp 888 Add ONNX support to MPNet embeddings by @ahmedlone127 in #13955
- Adding ONNX Support to ALBERT Token and Sequence Classification and Question Answering annotators by @danilojsl in #13956
- SPARKNLP-884 Enabling getVectors method by @danilojsl in #13957
- [SPARKNLP-890] ONNX E5 MPnet example by @DevinTDHa in #13958
- Models hub by @maziyarpanahi in #13972
- Fixing onnx saving path bug by @ahmedlone127 in #13959
- release/511-release-candidate by @maziyarpanahi in #13961
Full Changelog: 5.1.0...5.1.1