Spark NLP 4.4.2: Patch release
π’ Overview
Spark NLP 4.4.2 π comes with a new RoBertaForZeroShotClassification
annotator for Zero-Shot tex classification (both multi-class and multi-label), full support for Apache Spark 3.4, faster and more memory-efficient BART models, a new cache feature for BartTransformer, new Databricks runtimes, and many more!
We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 17,000+ free and truly open-source models & pipelines. π
Spark NLP has a new home! https://sparknlp.org is where you can find all the documentation, models, and demos for Spark NLP. It aims to provide valuable resources to anyone interested in 100% open-source NLP solutions by using Spark NLP π
β New Features & Enhancements
- NEW: Introducing ** RoBertaForZeroShotClassification** annotator for Zero-Shot Text Classification in Spark NLP π. You can use the
RoBertaForZeroShotClassification
annotator for text classification with your labels! π―
Zero-Shot Learning (ZSL): Traditionally, ZSL most often referred to a fairly specific type of task: learning a classifier on one set of labels and then evaluating on a different set of labels that the classifier has never seen before. Recently, especially in NLP, it's been used much more broadly to get a model to do something it wasn't explicitly trained to do. A well-known example of this is in the GPT-2 paper where the authors evaluate a language model on downstream tasks like machine translation without fine-tuning on these tasks directly.
Let's see how easy it is to just use any set of labels our trained model has never seen via the setCandidateLabels()
param:
zero_shot_classifier = RoBertaForZeroShotClassification \
.pretrained() \
.setInputCols(["document", "token"]) \
.setOutputCol("class") \
.setCandidateLabels(["urgent", "mobile", "travel", "movie", "music", "sport", "weather", "technology"])
For Zero-Short Multi-class Text Classification:
+----------------------------------------------------------------------------------------------------------------+--------+
|result |result |
+----------------------------------------------------------------------------------------------------------------+--------+
|[I have a problem with my iPhone that needs to be resolved asap!!] |[mobile]|
|[Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.]|[mobile]|
|[I have a phone and I love it!] |[mobile]|
|[I want to visit Germany and I am planning to go there next year.] |[travel]|
|[Let's watch some movies tonight! I am in the mood for a horror movie.] |[movie] |
|[Have you watched the match yesterday? It was a great game!] |[sport] |
|[We need to hurry up and get to the airport. We are going to miss our flight!] |[urgent]|
+----------------------------------------------------------------------------------------------------------------+--------+
For Zero-Short Multi-class Text Classification:
+----------------------------------------------------------------------------------------------------------------+-----------------------------------+
|result |result |
+----------------------------------------------------------------------------------------------------------------+-----------------------------------+
|[I have a problem with my iPhone that needs to be resolved asap!!] |[urgent, mobile, movie, technology]|
|[Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.]|[urgent, technology] |
|[I have a phone and I love it!] |[mobile] |
|[I want to visit Germany and I am planning to go there next year.] |[travel] |
|[Let's watch some movies tonight! I am in the mood for a horror movie.] |[movie] |
|[Have you watched the match yesterday? It was a great game!] |[sport] |
|[We need to hurry up and get to the airport. We are going to miss our flight!] |[urgent, travel] |
+----------------------------------------------------------------------------------------------------------------+-----------------------------------+
- Offer full support for Apache Spark 3.4 #13773
- New BART models with memory efficiency and higher speed (it is not possible to use BART models in Colab) #13787
- Introducing the
cache
feature in BartTransformer #13787 - Welcoming 3 new Databricks runtimes to our Spark NLP family:
- Databricks 13.0 LTS
- Databricks 13.0 LTS ML
- Databricks 13.0 LTS ML GPU
- Improve error handling for max sequence length for transformers in Python #13774
- Improve the
MultiDateMatcher
annotator to return multiple dates #13783
π Bug Fixes
- Fix a bug in Tapas due to exceeding the maximum rank value #13772
- Fix loading Transformer models via loadSavedModel() method from DBFS on Databricks #13784
π Documentation
- Import models from TF Hub & HuggingFace
- Spark NLP Notebooks
- Models Hub with new models
- Spark NLP Articles
- Spark NLP in Action
- Spark NLP Documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
β€οΈ Community support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium Spark NLP articles
- YouTube Spark NLP video tutorials
Installation
Python
#PyPI
pip install spark-nlp==4.4.2
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.2
Apple Silicon (M1 & M2)
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.2
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.2
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>4.4.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>4.4.2</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>4.4.2</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>4.4.2</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.4.2.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-4.4.2.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-4.4.2.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-4.4.2.jar
What's Changed
- Models hub by @maziyarpanahi in #13770
- BUGFIX NMH-171: Fix multiselect for existing opensource docs [skip-test] by @KshitizGIT in #13771
- Fix Tapas bug due to exceeding the maximum rank value by @vankov in #13772
- SPARKNLP-819 Adding changes to make spark-nlp 3.4.0 default version by @danilojsl in #13773
- SPARKNLP-828: Raise error when exceeding max input length by @DevinTDHa in #13774
- SPARKNLP-797: Introduce Protected Features by @DevinTDHa in #13777
- add nlu spells to modelhub cards by @ahmedlone127 in #13778
- Sparknlp 811 implement RobertaForZeroShotClassification annotator by @ahmedlone127 in #13782
- SPARKNLP-832-MultiDateMatcher-doesn-t-return-multiple-dates by @danilojsl in #13783
- Fix loadSavedModel for DBFS by @DevinTDHa in #13784
- Sparknlp 826 upload new optimized models for bart and Generate function by @prabod in #13787
- Update XXXForSequence with multilabel and activation function by @josejuanmartinez in #13779
- Updated Tensorflow model input and output signature to use ModelSignatureConstants by @prabod in #13790
- reverted changes by @ahmedlone127 in #13791
- Models hub by @maziyarpanahi in #13793
- Release/442 release candidate by @maziyarpanahi in #13789
Full Changelog: 4.4.1...4.4.2