Release Spark NLP 4.4.1: Patch release · JohnSnowLabs/spark-nlp

📢 Overview

Spark NLP 4.4.1 🚀 comes with a new DistilBertForZeroShotClassification annotator for Zero-Shot tex classification (both multi-class and multi-label), a new threshold parameter in all XXXForSequenceClassification annotators to filter out classes based on their scores, and new notebooks to import models for Image Classification with Swin and ConvNext architectures.

We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 17,000+ free and truly open-source models & pipelines. 🎉

Spark NLP has a new home! https://sparknlp.org is where you can find all the documentation, models, and demos for Spark NLP. It aims to provide valuable resources to anyone interested in 100% open-source NLP solutions by using Spark NLP 🚀.

⭐ New Features & Enhancements

NEW: Introducing DistilBertForZeroShotClassification annotator for Zero-Shot Text Classification in Spark NLP 🚀. You can use the DistilBertForZeroShotClassification annotator for text classification with your labels! 💯

Zero-Shot Learning (ZSL): Traditionally, ZSL most often referred to a fairly specific type of task: learning a classifier on one set of labels and then evaluating on a different set of labels that the classifier has never seen before. Recently, especially in NLP, it's been used much more broadly to get a model to do something it wasn't explicitly trained to do. A well-known example of this is in the GPT-2 paper where the authors evaluate a language model on downstream tasks like machine translation without fine-tuning on these tasks directly.

Let's see how easy it is to just use any set of labels our trained model has never seen via the setCandidateLabels() param:

zero_shot_classifier = DistilBertForZeroShotClassification \
    .pretrained() \
    .setInputCols(["document", "token"]) \
    .setOutputCol("class") \
    .setCandidateLabels(["urgent", "mobile", "travel", "movie", "music", "sport", "weather", "technology"])

For Zero-Short Multi-class Text Classification:

+----------------------------------------------------------------------------------------------------------------+--------+
|result                                                                                                          |result  |
+----------------------------------------------------------------------------------------------------------------+--------+
|[I have a problem with my iPhone that needs to be resolved asap!!]                                              |[mobile]|
|[Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.]|[mobile]|
|[I have a phone and I love it!]                                                                                 |[mobile]|
|[I want to visit Germany and I am planning to go there next year.]                                              |[travel]|
|[Let's watch some movies tonight! I am in the mood for a horror movie.]                                         |[movie] |
|[Have you watched the match yesterday? It was a great game!]                                                    |[sport] |
|[We need to hurry up and get to the airport. We are going to miss our flight!]                                  |[urgent]|
+----------------------------------------------------------------------------------------------------------------+--------+

For Zero-Short Multi-class Text Classification:

+----------------------------------------------------------------------------------------------------------------+-----------------------------------+
|result                                                                                                          |result                             |
+----------------------------------------------------------------------------------------------------------------+-----------------------------------+
|[I have a problem with my iPhone that needs to be resolved asap!!]                                              |[urgent, mobile, movie, technology]|
|[Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.]|[urgent, technology]               |
|[I have a phone and I love it!]                                                                                 |[mobile]                           |
|[I want to visit Germany and I am planning to go there next year.]                                              |[travel]                           |
|[Let's watch some movies tonight! I am in the mood for a horror movie.]                                         |[movie]                            |
|[Have you watched the match yesterday? It was a great game!]                                                    |[sport]                            |
|[We need to hurry up and get to the airport. We are going to miss our flight!]                                  |[urgent, travel]                   |
+----------------------------------------------------------------------------------------------------------------+-----------------------------------+

Adding threshold param to AlbertForSequenceClassification, BertForSequenceClassification, BertForZeroShotClassification, DistilBertForSequenceClassification, CamemBertForSequenceClassification, DeBertaForSequenceClassification, LongformerForSequenceClassification, RoBertaForQuestionAnswering, XlmRoBertaForSequenceClassification, and XlnetForSequenceClassification annotators
Add new notebooks to import models for SwinForImageClassification and ConvNextForImageClassification annotators for Image Classification

📓 New Notebooks

Notebooks	Colab
Zero-Shot Text Classification
ConvNextForImageClassification
SwinForImageClassification

You can visit Import Transformers in Spark NLP
You can visit Spark NLP Examples for 100+ examples

📖 Documentation

❤️ Community support

Slack For live discussion with the Spark NLP community and the team
GitHub Bug reports, feature requests, and contributions
Discussions Engage with other community members, share ideas,
and show off how you use Spark NLP!
Medium Spark NLP articles
YouTube Spark NLP video tutorials

Installation

Python

#PyPI

pip install spark-nlp==4.4.1

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.1

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.1

Apple Silicon (M1 & M2)

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.1

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.1

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>4.4.1</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>4.4.1</version>
</dependency>

spark-nlp-silicon:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>4.4.1</version>
</dependency>

spark-nlp-aarch64:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>4.4.1</version>
</dependency>

FAT JARs

CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.4.1.jar
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-4.4.1.jar
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-4.4.1.jar
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-4.4.1.jar

What's Changed

Full Changelog: 4.4.0...4.4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark NLP 4.4.1: Patch release