diff --git a/CHANGELOG b/CHANGELOG index 39e902cfcf7b85..6f4a245d27b415 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,20 @@ +======== +5.0.2 +======== +---------------- +New Features & Enhancements +---------------- +* **NEW:** Introducing support for ONNX Runtime in ALBERT, CamemBERT, and XLM-RoBERTa annotators +* **NEW:** Implement ZeroShotNerModel annotator for zero-shot NER based on XLM-RoBERTa architecture + +---------------- +Bug Fixes +---------------- +* Fix MarianTransformers annotator breaking with `java.lang.ClassCastException` in Python +* Fix out of 0.0/1.0 accuracy in SentenceDetectorDL and MultiClassifierDL annotators +* Fix BART issue with low temperature value that only occurred when there are no non infinite logits satisfying the low temperature and top_k values +* Add missing E5Embeddings and InstructorEmbeddings annotators to `annotators` in Scala for easy all-in-one import + ======== 5.0.1 ======== @@ -39,7 +56,7 @@ New Features & Enhancements ---------------- Bug Fixes ---------------- -* Fix not being able to save models from XXXForSequenceClassitication and XXXForZeroShotClassification annotoators https://github.com/JohnSnowLabs/spark-nlp/pull/13842 +* Fix not being able to save models from XXXForSequenceClassification and XXXForZeroShotClassification annotators https://github.com/JohnSnowLabs/spark-nlp/pull/13842 ======== @@ -48,7 +65,7 @@ Bug Fixes ---------------- New Features & Enhancements ---------------- -* New `multilabel` parameter to swtich from multi-class to multi-label on all Classifiers in Spark NLP: AlbertForSequenceClassification, BertForSequenceClassification, DeBertaForSequenceClassification, DistilBertForSequenceClassification, LongformerForSequenceClassification, RoBertaForSequenceClassification, XlmRoBertaForSequenceClassification, XlnetForSequenceClassification, BertForZeroShotClassification, DistilBertForZeroShotClassification, and RobertaForZeroShotClassification +* New `multilabel` parameter to switch from multi-class to multi-label on all Classifiers in Spark NLP: AlbertForSequenceClassification, BertForSequenceClassification, DeBertaForSequenceClassification, DistilBertForSequenceClassification, LongformerForSequenceClassification, RoBertaForSequenceClassification, XlmRoBertaForSequenceClassification, XlnetForSequenceClassification, BertForZeroShotClassification, DistilBertForZeroShotClassification, and RobertaForZeroShotClassification * Refactor protected Params and Features to avoid unwanted exceptions during runtime https://github.com/JohnSnowLabs/spark-nlp/pull/13797 * Add proper documentation and instructions for ZeroShot classifiers: BertForZeroShotClassification, DistilBertForZeroShotClassification, and RobertaForZeroShotClassification https://github.com/JohnSnowLabs/spark-nlp/pull/13798 * Extend support for downloading models/pipelines directly by given name or S3 path in ResourceDownloader https://github.com/JohnSnowLabs/spark-nlp/pull/13796 @@ -58,7 +75,7 @@ Bug Fixes ---------------- * Fix pretrained pipelines that stopped working since 4.4.2 release on PySpark 3.0 and 3.1 versions (adding 123 new pipelines were added) https://github.com/JohnSnowLabs/spark-nlp/pull/13805 * Fix pretrained pipelines that stopped working since 4.4.2 release on PySpark 3.2 and 3.3 versions (adding 120 new pipelines) https://github.com/JohnSnowLabs/spark-nlp/pull/13811 -* Fix Java compatibility issue caused by SystemUtils dependecy https://github.com/JohnSnowLabs/spark-nlp/pull/13806 +* Fix Java compatibility issue caused by SystemUtils dependency https://github.com/JohnSnowLabs/spark-nlp/pull/13806 ======== @@ -157,7 +174,7 @@ New Features * Implement HubertForCTC annotator for automatic speech recognition * Implement SwinForImageClassification annotator for Image Classification * Introducing CamemBERT for Question Answering annotator -* Implement ZeroShotNerModel annotator for zero-shot NER baed on RoBERTa architecture +* Implement ZeroShotNerModel annotator for zero-shot NER based on RoBERTa architecture * Implement Date2Chunk annotator * Enable params argument in spark_nlp start() function * Allow doc_id reading CoNLL file datasets @@ -198,7 +215,7 @@ Bug Fixes & Enhancements * Fix missing to output embeddings in `.fullAnnotate()` method when `parseEmbeddings` param was set to `True/true` * Fix broken links to the Python API pages, as the generation of the PyDocs was slightly changed in a previous release. This makes the Python APIs accessible from the Annotators and Transformers pages like before * Change default values of `explodeEntities` and `mergeEntities` parameters to `true` -* Better error handling when there are empty paths/relations in `GraphExctraction`annotator. New message will better guide the user on how to configure `GraphExtraction` to output meaningful relationships +* Better error handling when there are empty paths/relations in `GraphExtraction`annotator. New message will better guide the user on how to configure `GraphExtraction` to output meaningful relationships * Removed the duplicated definition of method `setWeightedDistPath` from `ContextSpellCheckerApproach` @@ -367,7 +384,7 @@ Bug Fixes ---------------- * Fix a bug in generating the NerDL graph by using TF v2. The previous graph generated by the `TFGraphBuilder` annotator resulted in an exception when the length of the sequence was 1. This issue has been resolved and the new graphs created by `TFGraphBuilder` won't have this issue anymore (https://github.com/JohnSnowLabs/spark-nlp/pull/12636) * Fix a bug introduced in the 4.0.0 release between Transformer-based Word Embeddings annotators. In the 4.0.0 release, the following annotators were migrated to BatchAnnotate to improve their performance, especially on GPU. However, a bug was introduced in sentence indices which when it is combined with SentenceEmbeddings for Text Classifications tasks (ClassifierDLApproach, SentimentDLApproach, and ClassifierDLApproach) resulted in low accuracy: AlbertEmbeddings, CamemBertEmbeddings, DeBertaEmbeddings, DistilBertEmbeddings, LongformerEmbeddings, RoBertaEmbeddings, XlmRoBertaEmbeddings, and XlnetEmbeddings (https://github.com/JohnSnowLabs/spark-nlp/pull/12641) -* Add support for a list of questions and context in LightPipline. Previously, only one context and question at a time were supported in LightPipeline for Question Answering annotators. We have added support to `fullAnnotate` and `annotate` to receive two lists of questions and contexts (https://github.com/JohnSnowLabs/spark-nlp/pull/12653) +* Add support for a list of questions and context in LightPipeline. Previously, only one context and question at a time were supported in LightPipeline for Question Answering annotators. We have added support to `fullAnnotate` and `annotate` to receive two lists of questions and contexts (https://github.com/JohnSnowLabs/spark-nlp/pull/12653) * Fix division by zero exception in the `GPT2Transformer` annotator when the `setDoSample` param was set to true (https://github.com/JohnSnowLabs/spark-nlp/pull/12661) ======== @@ -437,7 +454,7 @@ New Features & Enhancements * Migrate T5Transformer to TensorFlow v2 architecture with re-uploading all the existing models * Official support for Apple silicon M1 on macOS devices. From Spark NLP 4.0.0 you can use `spark-nlp-m1` package that supports Apple silicon M1 on your macOS machine * Official support for Apache Spark and PySpark 3.2.x on Scala 2.12. Spark NLP by default is shipped for Spark 3.2.x and supports Spark/PySpark 3.0.x and 3.1.x in additions -* Unifying all supported Apache Spark pacakges on Maven into `spark-nlp` for CPU, `spark-nlp-gpu` for GPU, and `spark-nlp-m1` for new Apple silicon M1 on macOS. The need for Apache Spark specific package like `spark-nlp-spark32` has been removed. +* Unifying all supported Apache Spark packages on Maven into `spark-nlp` for CPU, `spark-nlp-gpu` for GPU, and `spark-nlp-m1` for new Apple silicon M1 on macOS. The need for Apache Spark specific package like `spark-nlp-spark32` has been removed. * Adding a new param to sparknlp.start() function in Python and Scala for Apple silicon M1 on macOS (`m1=True`) * Update Colab, Kaggle, and SageMaker scripts * Add new default NerDL graph for xsmall DeBERTa embeddings model (384 dimensions) @@ -467,7 +484,7 @@ Bug Fixes ---------------- * Fix the default pre-trained model for DeBertaForTokenClassification in Scala and Python * Remove a requirement in DocumentNormalizer that consecutive stage processing can produce empty text annotations without breaking the pipeline -* Fix WordSegmenterModel outputing wrong order of tokens. The regex that groups the tagging format was refactored to preserve the order of segmented outputs (tokens) +* Fix WordSegmenterModel outputting wrong order of tokens. The regex that groups the tagging format was refactored to preserve the order of segmented outputs (tokens) * Fix encoding sentences not respecting the max sequence length given by a user in XlmRobertaSentenceEmbeddings * Fix encoding sentences by using SentencePiece to calculate the correct tokens indexing * Fix SentencePiece serialization issue when XlmRoBertaEmbeddings and XlmRoBertaSentenceEmbeddings annotators are used from a Fat JAR on GPU diff --git a/README.md b/README.md index 57c4ce0793a35a..559c7344328303 100644 --- a/README.md +++ b/README.md @@ -167,7 +167,7 @@ To use Spark NLP you need the following requirements: **GPU (optional):** -Spark NLP 5.0.1 is built with ONNX 1.15.1 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support: +Spark NLP 5.0.2 is built with ONNX 1.15.1 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support: - NVIDIA® GPU drivers version 450.80.02 or higher - CUDA® Toolkit 11.2 @@ -183,7 +183,7 @@ $ java -version $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.0.1 pyspark==3.3.1 +$ pip install spark-nlp==5.0.2 pyspark==3.3.1 ``` In Python console or Jupyter `Python3` kernel: @@ -228,7 +228,7 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh ## Apache Spark Support -Spark NLP *5.0.1* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x +Spark NLP *5.0.2* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x | Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x | Apache Spark 3.0.x | Apache Spark 3.1.x | Apache Spark 3.2.x | Apache Spark 3.3.x | Apache Spark 3.4.x | |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| @@ -267,7 +267,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github ## Databricks Support -Spark NLP 5.0.1 has been tested and is compatible with the following runtimes: +Spark NLP 5.0.2 has been tested and is compatible with the following runtimes: **CPU:** @@ -325,7 +325,7 @@ Spark NLP 5.0.1 has been tested and is compatible with the following runtimes: ## EMR Support -Spark NLP 5.0.1 has been tested and is compatible with the following EMR releases: +Spark NLP 5.0.2 has been tested and is compatible with the following EMR releases: - emr-6.2.0 - emr-6.3.0 @@ -369,11 +369,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x, ```sh # CPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 ``` The `spark-nlp` has been published to @@ -382,11 +382,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # GPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.0.1 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.0.2 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.0.1 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.0.2 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.0.1 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.0.2 ``` @@ -396,11 +396,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # AArch64 -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.0.1 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.0.2 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.0.1 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.0.2 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.0.1 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.0.2 ``` @@ -410,11 +410,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # M1/M2 (Apple Silicon) -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.0.1 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.0.2 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.0.1 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.0.2 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.0.1 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.0.2 ``` @@ -428,7 +428,7 @@ set in your SparkSession: spark-shell \ --driver-memory 16g \ --conf spark.kryoserializer.buffer.max=2000M \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 ``` ## Scala @@ -446,7 +446,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp_2.12 - 5.0.1 + 5.0.2 ``` @@ -457,7 +457,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-gpu_2.12 - 5.0.1 + 5.0.2 ``` @@ -468,7 +468,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-aarch64_2.12 - 5.0.1 + 5.0.2 ``` @@ -479,7 +479,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-silicon_2.12 - 5.0.1 + 5.0.2 ``` @@ -489,28 +489,28 @@ coordinates: ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.0.1" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.0.2" ``` **spark-nlp-gpu:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.0.1" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.0.2" ``` **spark-nlp-aarch64:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.0.1" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.0.2" ``` **spark-nlp-silicon:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.0.1" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.0.2" ``` Maven @@ -532,7 +532,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through Pip: ```bash -pip install spark-nlp==5.0.1 +pip install spark-nlp==5.0.2 ``` Conda: @@ -561,7 +561,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2") .getOrCreate() ``` @@ -632,7 +632,7 @@ Use either one of the following options - Add the following Maven Coordinates to the interpreter's library list ```bash -com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 +com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 ``` - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is @@ -643,7 +643,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 Apart from the previous step, install the python module through pip ```bash -pip install spark-nlp==5.0.1 +pip install spark-nlp==5.0.2 ``` Or you can install `spark-nlp` from inside Zeppelin by using Conda: @@ -671,7 +671,7 @@ launch the Jupyter from the same Python environment: $ conda create -n sparknlp python=3.8 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.0.1 pyspark==3.3.1 jupyter +$ pip install spark-nlp==5.0.2 pyspark==3.3.1 jupyter $ jupyter notebook ``` @@ -688,7 +688,7 @@ export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS=notebook -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 ``` Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp` @@ -715,7 +715,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.0.1 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.0.2 ``` [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) @@ -738,7 +738,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.0.1 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.0.2 ``` [Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live @@ -757,9 +757,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP 3. In `Libraries` tab inside your cluster you need to follow these steps: - 3.1. Install New -> PyPI -> `spark-nlp==5.0.1` -> Install + 3.1. Install New -> PyPI -> `spark-nlp==5.0.2` -> Install - 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1` -> Install + 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2` -> Install 4. Now you can attach your notebook to the cluster and use Spark NLP! @@ -810,7 +810,7 @@ A sample of your software configuration in JSON on S3 (must be public access): "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", - "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1" + "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2" } }] ``` @@ -819,7 +819,7 @@ A sample of AWS CLI to launch EMR cluster: ```.sh aws emr create-cluster \ ---name "Spark NLP 5.0.1" \ +--name "Spark NLP 5.0.2" \ --release-label emr-6.2.0 \ --applications Name=Hadoop Name=Spark Name=Hive \ --instance-type m4.4xlarge \ @@ -883,7 +883,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \ --enable-component-gateway \ --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \ --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \ - --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 + --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 ``` 2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI. @@ -922,7 +922,7 @@ spark = SparkSession.builder .config("spark.kryoserializer.buffer.max", "2000m") .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2") .getOrCreate() ``` @@ -936,7 +936,7 @@ spark-shell \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 ``` **pyspark:** @@ -949,7 +949,7 @@ pyspark \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.1 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.0.2 ``` **Databricks:** @@ -1221,7 +1221,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars", "/tmp/spark-nlp-assembly-5.0.1.jar") + .config("spark.jars", "/tmp/spark-nlp-assembly-5.0.2.jar") .getOrCreate() ``` @@ -1230,7 +1230,7 @@ spark = SparkSession.builder version (3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x) - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. ( - i.e., `hdfs:///tmp/spark-nlp-assembly-5.0.1.jar`) + i.e., `hdfs:///tmp/spark-nlp-assembly-5.0.2.jar`) Example of using pretrained Models and Pipelines in offline: diff --git a/build.sbt b/build.sbt index 2fdac1c421cc55..f6a58d47bb443e 100644 --- a/build.sbt +++ b/build.sbt @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64) organization := "com.johnsnowlabs.nlp" -version := "5.0.1" +version := "5.0.2" (ThisBuild / scalaVersion) := scalaVer diff --git a/conda/meta.yaml b/conda/meta.yaml index df735000acb643..88d4fe116952c0 100644 --- a/conda/meta.yaml +++ b/conda/meta.yaml @@ -1,5 +1,5 @@ {% set name = "spark-nlp" %} -{% set version = "5.0.1" %} +{% set version = "5.0.2" %} package: name: {{ name|lower }} @@ -7,7 +7,7 @@ package: source: url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/spark-nlp-{{ version }}.tar.gz - sha256: c46251c9a0ee674ae2a56249eebbc33bf9e6c27e3cf15bf1a7249da425a94ca9 + sha256: 690a9509bea5adddb55557539ca8fc1a8b949e73fb69499007829ae857284050 build: noarch: python diff --git a/docs/_layouts/landing.html b/docs/_layouts/landing.html index 654d6642bf3c8d..4df163fe53e938 100755 --- a/docs/_layouts/landing.html +++ b/docs/_layouts/landing.html @@ -201,7 +201,7 @@

{{ _section.title }}

{% highlight bash %} # Using PyPI - $ pip install spark-nlp==5.0.1 + $ pip install spark-nlp==5.0.2 # Using Anaconda/Conda $ conda install -c johnsnowlabs spark-nlp diff --git a/docs/api/com/index.html b/docs/api/com/index.html index 0e735e306ba9fe..2aa77b1345b6ec 100644 --- a/docs/api/com/index.html +++ b/docs/api/com/index.html @@ -3,9 +3,9 @@ - Spark NLP 5.0.1 ScalaDoc - com - - + Spark NLP 5.0.2 ScalaDoc - com + + @@ -28,7 +28,7 @@