Spark NLP 5.1.3: New ONNX Configs, ONNX support for BERT Token and Sequence Classifications, DistilBERT token and sequence classifications, BERT and DistilBERT Question Answering, and bug fixes!
π’ Overview
Spark NLP 5.1.3 π comes with new ONNX support for BertForTokenClassification
, BertForSequenceClassification
, BertForQuestionAnswering
, DistilBertForTokenClassification
, DistilBertForSequenceClassification
, and DistilBertForQuestionAnswering
annotators, a new way to configure ONNX Runtime via Spark NLP Config, and bug fixes!
We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 21,000+ free and truly open-source models & pipelines. π
π₯ New Features & Enhancements
- NEW: Introducing support for ONNX Runtime in BertForTokenClassification annotator
- NEW: Introducing support for ONNX Runtime in BertForSequenceClassification annotator
- NEW: Introducing support for ONNX Runtime in BertForQuestionAnswering annotator
- NEW: Introducing support for ONNX Runtime in DistilBertForTokenClassification annotator
- NEW: Introducing support for ONNX Runtime in DistilBertForSequenceClassification annotator
- NEW: Introducing support for ONNX Runtime in DistilBertForQuestionAnswering annotator
- NEW: Setting ONNX configuration such as GPU device id, execution mode, etc. via Spark NLP configs
onnx_params = {
"spark.jsl.settings.onnx.gpuDeviceId": "0",
"spark.jsl.settings.onnx.intraOpNumThreads": "5",
"spark.jsl.settings.onnx.optimizationLevel": "BASIC_OPT",
"spark.jsl.settings.onnx.executionMode": "SEQUENTIAL"
}
import sparknlp
# let's start Spark with Spark NLP
spark = sparknlp.start(params=onnx_params)
- Update Whisper documentation with minimum required version of Spark/PySpark (3.4)
π Bug Fixes
- Fix
module 'sparknlp.annotator' has no attribute 'Token2Chunk'
error in Python when usingToken2Chunk
annotator inside loaded PipelineModel
π New Notebooks
π Documentation
- Import models from TF Hub & HuggingFace
- Spark NLP Notebooks
- Models Hub with new models
- Spark NLP Articles
- Spark NLP in Action
- Spark NLP Documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
β€οΈ Community support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium Spark NLP articles
- JohnSnowLabs official Medium
- YouTube Spark NLP video tutorials
Installation
Python
#PyPI
pip install spark-nlp==5.1.3
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.3
Apple Silicon (M1 & M2)
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.3
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.3
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>5.1.3</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>5.1.3</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>5.1.3</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>5.1.3</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.1.3.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.1.3.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.1.3.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.1.3.jar
What's Changed
- Fixing some 404 errors by @agsfer in #14012
- SPARKNLP-907 Allows setting up ONNX configs through spark session by @danilojsl in #14009
- Adding ONNX support for BertClassification by @danilojsl in #14013
- Adding ONNX support for DistilBertClassification by @danilojsl in #14014
- SPARKNLP-919: Add note for Spark Version support by @DevinTDHa in #14015
- Sparknlp 927 token 2 chunk is not in the right python package and fails in a loaded pipeline model by @maziyarpanahi in #14018
- release/513-release-candidate by @maziyarpanahi in #14020
Full Changelog: 5.1.2...5.1.3