Spark NLP 4.4.1: Patch release
π’ Overview
Spark NLP 4.4.1 π comes with a new DistilBertForZeroShotClassification
annotator for Zero-Shot tex classification (both multi-class and multi-label), a new threshold
parameter in all XXXForSequenceClassification
annotators to filter out classes based on their scores, and new notebooks to import models for Image Classification with Swin
and ConvNext
architectures.
We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 17,000+ free and truly open-source models & pipelines. π
Spark NLP has a new home! https://sparknlp.org is where you can find all the documentation, models, and demos for Spark NLP. It aims to provide valuable resources to anyone interested in 100% open-source NLP solutions by using Spark NLP π.
β New Features & Enhancements
- NEW: Introducing DistilBertForZeroShotClassification annotator for Zero-Shot Text Classification in Spark NLP π. You can use the
DistilBertForZeroShotClassification
annotator for text classification with your labels! π―
Zero-Shot Learning (ZSL): Traditionally, ZSL most often referred to a fairly specific type of task: learning a classifier on one set of labels and then evaluating on a different set of labels that the classifier has never seen before. Recently, especially in NLP, it's been used much more broadly to get a model to do something it wasn't explicitly trained to do. A well-known example of this is in the GPT-2 paper where the authors evaluate a language model on downstream tasks like machine translation without fine-tuning on these tasks directly.
Let's see how easy it is to just use any set of labels our trained model has never seen via the setCandidateLabels()
param:
zero_shot_classifier = DistilBertForZeroShotClassification \
.pretrained() \
.setInputCols(["document", "token"]) \
.setOutputCol("class") \
.setCandidateLabels(["urgent", "mobile", "travel", "movie", "music", "sport", "weather", "technology"])
For Zero-Short Multi-class Text Classification:
+----------------------------------------------------------------------------------------------------------------+--------+
|result |result |
+----------------------------------------------------------------------------------------------------------------+--------+
|[I have a problem with my iPhone that needs to be resolved asap!!] |[mobile]|
|[Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.]|[mobile]|
|[I have a phone and I love it!] |[mobile]|
|[I want to visit Germany and I am planning to go there next year.] |[travel]|
|[Let's watch some movies tonight! I am in the mood for a horror movie.] |[movie] |
|[Have you watched the match yesterday? It was a great game!] |[sport] |
|[We need to hurry up and get to the airport. We are going to miss our flight!] |[urgent]|
+----------------------------------------------------------------------------------------------------------------+--------+
For Zero-Short Multi-class Text Classification:
+----------------------------------------------------------------------------------------------------------------+-----------------------------------+
|result |result |
+----------------------------------------------------------------------------------------------------------------+-----------------------------------+
|[I have a problem with my iPhone that needs to be resolved asap!!] |[urgent, mobile, movie, technology]|
|[Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.]|[urgent, technology] |
|[I have a phone and I love it!] |[mobile] |
|[I want to visit Germany and I am planning to go there next year.] |[travel] |
|[Let's watch some movies tonight! I am in the mood for a horror movie.] |[movie] |
|[Have you watched the match yesterday? It was a great game!] |[sport] |
|[We need to hurry up and get to the airport. We are going to miss our flight!] |[urgent, travel] |
+----------------------------------------------------------------------------------------------------------------+-----------------------------------+
- Adding
threshold
param toAlbertForSequenceClassification
,BertForSequenceClassification
,BertForZeroShotClassification
,DistilBertForSequenceClassification
,CamemBertForSequenceClassification
,DeBertaForSequenceClassification
, LongformerForSequenceClassification, RoBertaForQuestionAnswering
,XlmRoBertaForSequenceClassification
, andXlnetForSequenceClassification
annotators - Add new notebooks to import models for
SwinForImageClassification
andConvNextForImageClassification
annotators for Image Classification
π New Notebooks
Notebooks | Colab |
---|---|
Zero-Shot Text Classification | |
ConvNextForImageClassification | |
SwinForImageClassification |
- You can visit Import Transformers in Spark NLP
- You can visit Spark NLP Examples for 100+ examples
π Documentation
- Import models from TF Hub & HuggingFace
- Spark NLP Notebooks
- Models Hub with new models
- Spark NLP Articles
- Spark NLP in Action
- Spark NLP Documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
β€οΈ Community support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas,
and show off how you use Spark NLP! - Medium Spark NLP articles
- YouTube Spark NLP video tutorials
Installation
Python
#PyPI
pip install spark-nlp==4.4.1
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.1
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.1
Apple Silicon (M1 & M2)
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.1
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.1
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>4.4.1</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>4.4.1</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>4.4.1</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>4.4.1</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.4.1.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-4.4.1.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-4.4.1.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-4.4.1.jar
What's Changed
Full Changelog: 4.4.0...4.4.1