Skip to content

Spark NLP 5.1.3: New ONNX Configs, ONNX support for BERT Token and Sequence Classifications, DistilBERT token and sequence classifications, BERT and DistilBERT Question Answering, and bug fixes!

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 10 Oct 20:26
· 291 commits to master since this release
1fa94e9

πŸ“’ Overview

Spark NLP 5.1.3 πŸš€ comes with new ONNX support for BertForTokenClassification, BertForSequenceClassification, BertForQuestionAnswering, DistilBertForTokenClassification, DistilBertForSequenceClassification, and DistilBertForQuestionAnswering annotators, a new way to configure ONNX Runtime via Spark NLP Config, and bug fixes!

We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 21,000+ free and truly open-source models & pipelines. πŸŽ‰


πŸ”₯ New Features & Enhancements

  • NEW: Introducing support for ONNX Runtime in BertForTokenClassification annotator
  • NEW: Introducing support for ONNX Runtime in BertForSequenceClassification annotator
  • NEW: Introducing support for ONNX Runtime in BertForQuestionAnswering annotator
  • NEW: Introducing support for ONNX Runtime in DistilBertForTokenClassification annotator
  • NEW: Introducing support for ONNX Runtime in DistilBertForSequenceClassification annotator
  • NEW: Introducing support for ONNX Runtime in DistilBertForQuestionAnswering annotator
  • NEW: Setting ONNX configuration such as GPU device id, execution mode, etc. via Spark NLP configs
onnx_params = {
    "spark.jsl.settings.onnx.gpuDeviceId": "0",
    "spark.jsl.settings.onnx.intraOpNumThreads": "5",
    "spark.jsl.settings.onnx.optimizationLevel": "BASIC_OPT",
    "spark.jsl.settings.onnx.executionMode": "SEQUENTIAL"
}

import sparknlp
# let's start Spark with Spark NLP
spark = sparknlp.start(params=onnx_params)
  • Update Whisper documentation with minimum required version of Spark/PySpark (3.4)

πŸ› Bug Fixes

  • Fix module 'sparknlp.annotator' has no attribute 'Token2Chunk' error in Python when using Token2Chunk annotator inside loaded PipelineModel

πŸ““ New Notebooks

Notebooks Colab
HuggingFace ONNX in Spark NLP BertForQuestionAnswering Open In Colab
HuggingFace ONNX in Spark NLP BertForSequenceClassification Open In Colab
HuggingFace ONNX in Spark NLP BertForTokenClassification Open In Colab
HuggingFace ONNX in Spark NLP DistilBertForQuestionAnswering Open In Colab
HuggingFace ONNX in Spark NLP DistilBertForSequenceClassification Open In Colab
HuggingFace ONNX in Spark NLP DistilBertForTokenClassification Open In Colab

πŸ“– Documentation


❀️ Community support

  • Slack For live discussion with the Spark NLP community and the team
  • GitHub Bug reports, feature requests, and contributions
  • Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
  • Medium Spark NLP articles
  • JohnSnowLabs official Medium
  • YouTube Spark NLP video tutorials

Installation

Python

#PyPI

pip install spark-nlp==5.1.3

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.3

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.3

Apple Silicon (M1 & M2)

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.3

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.3

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.3

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.3

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>5.1.3</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>5.1.3</version>
</dependency>

spark-nlp-silicon:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>5.1.3</version>
</dependency>

spark-nlp-aarch64:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>5.1.3</version>
</dependency>

FAT JARs

What's Changed

Full Changelog: 5.1.2...5.1.3