Spark NLP 4.4.3: Patch release
π’ Overview
Spark NLP 4.4.3 π comes with a new param to switch from multi-class
to multi-label
in all of our classifiers including ZeroShot, extending support to download models directly with an S3 path in ResourceDownloader, bug fixes, and improvements!
We want to thank our community for their valuable feedback, feature requests, and contributions. Our Models Hub now contains over 18,000+ free and truly open-source models & pipelines. π
Spark NLP has a new home! https://sparknlp.org is where you can find all the documentation, models, and demos for Spark NLP. It aims to provide valuable resources to anyone interested in 100% open-source NLP solutions by using Spark NLP π
β New Features & Enhancements
- New
multilabel
parameter to switch from multi-class to multi-label on all Classifiers in Spark NLP:AlbertForSequenceClassification
,BertForSequenceClassification
,DeBertaForSequenceClassification
,DistilBertForSequenceClassification
,LongformerForSequenceClassification
,RoBertaForSequenceClassification
,XlmRoBertaForSequenceClassification
,XlnetForSequenceClassification
,BertForZeroShotClassification
,DistilBertForZeroShotClassification
, andRobertaForZeroShotClassification
- Refactor protected Params and Features to avoid unwanted exceptions during runtime #13797
- Add proper documentation and instructions for ZeroShot classifiers:
BertForZeroShotClassification
,DistilBertForZeroShotClassification
, andRobertaForZeroShotClassification
#13798 - Extend support for downloading models/pipelines directly by given name or S3 path in ResourceDownloader #13796
from sparknlp.pretrained import ResourceDownloader
# partial S3 path
ResourceDownloader.downloadModelDirectly("public/models/albert_base_sequence_classifier_ag_news_en_3.4.0_3.0_1639648298937.zip", remote_loc = "public/models")
# full S3 path
ResourceDownloader.downloadModelDirectly("s3://auxdata.johnsnowlabs.com/public/models/albert_base_sequence_classifier_ag_news_en_3.4.0_3.0_1639648298937.zip", remote_loc = "public/models", unzip = False)
π Bug Fixes
- Fix pretrained pipelines that stopped working since the 4.4.2 release on PySpark 3.0 and 3.1 versions (adding 123 new pipelines were added) #13805
- Fix pretrained pipelines that stopped working since the 4.4.2 release on PySpark 3.4 versions (adding 120 new pipelines were added) #13828
- Fix Java compatibility issue caused by SystemUtils dependency #13806
Known issue:
Current pre-trained pipelines don't work on PySpark 3.2 and 3.3. They will all be fixed in the next few days.
π Documentation
- Import models from TF Hub & HuggingFace
- Spark NLP Notebooks
- Models Hub with new models
- Spark NLP Articles
- Spark NLP in Action
- Spark NLP Documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
β€οΈ Community support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium Spark NLP articles
- YouTube Spark NLP video tutorials
Installation
Python
#PyPI
pip install spark-nlp==4.4.3
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.3
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.3
Apple Silicon (M1 & M2)
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.3
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.3
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>4.4.3</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>4.4.3</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>4.4.3</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>4.4.3</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.4.3.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-4.4.3.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-4.4.3.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-4.4.3.jar
What's Changed
- Models hub by @maziyarpanahi in #13807
- Update 2022-07-11-pipeline_md_ca_3_0.md by @maziyarpanahi in #13808
- SPARKNLP-825 Adding multilabel param by @danilojsl in #13792
- SPARKNLP-835: ProtectedParam and ProtectedFeature by @DevinTDHa in #13797
- SPARKNLP-809: Add warning to ForZeroShot annotators by @DevinTDHa in #13798
- SPARKNLP-839 Fix Java Compatibility Issue by @danilojsl in #13806
- Models hub by @maziyarpanahi in #13823
- Add unzip param to downloadModelDirectly in ResourceDownloader by @mehmetbutgul in #13796
- release/443-release-candidate by @maziyarpanahi in #13822
- Models hub by @maziyarpanahi in #13830
- Models hub by @maziyarpanahi in #13832
New Contributors
- @mehmetbutgul made their first contribution in #13796
Full Changelog: 4.4.2...4.4.3