diff --git a/CHANGELOG b/CHANGELOG index dfe6fd661f3319..f8cfaf23daa84b 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,25 @@ +======== +5.1.3 +======== +---------------- +New Features & Enhancements +---------------- +* **NEW:** Introducing support for ONNX Runtime in BertForTokenClassification annotator +* **NEW:** Introducing support for ONNX Runtime in BertForSequenceClassification annotator +* **NEW:** Introducing support for ONNX Runtime in BertForQuestionAnswering annotator +* **NEW:** Introducing support for ONNX Runtime in DistilBertForTokenClassification annotator +* **NEW:** Introducing support for ONNX Runtime in DistilBertForSequenceClassification annotator +* **NEW:** Introducing support for ONNX Runtime in DistilBertForQuestionAnswering annotator +* **NEW:** Setting ONNX configuration such as GPU device id, execution mode, etc. via Spark NLP configs +* Update Whisper documentation with minimum required version of Spark/PySpark (3.4) + +---------------- +Bug Fixes +---------------- +* Fix `module 'sparknlp.annotator' has no attribute 'Token2Chunk'` error in Python when using `Token2Chunk` annotator inside loaded PipelineModel + + + ======== 5.1.2 ======== diff --git a/README.md b/README.md index 9f67ff2b5cf985..fe80bc91c0fcb1 100644 --- a/README.md +++ b/README.md @@ -171,7 +171,7 @@ To use Spark NLP you need the following requirements: **GPU (optional):** -Spark NLP 5.1.2 is built with ONNX 1.15.1 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support: +Spark NLP 5.1.3 is built with ONNX 1.15.1 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support: - NVIDIA® GPU drivers version 450.80.02 or higher - CUDA® Toolkit 11.2 @@ -187,7 +187,7 @@ $ java -version $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.1.2 pyspark==3.3.1 +$ pip install spark-nlp==5.1.3 pyspark==3.3.1 ``` In Python console or Jupyter `Python3` kernel: @@ -232,7 +232,7 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh ## Apache Spark Support -Spark NLP *5.1.2* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x +Spark NLP *5.1.3* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x | Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x | Apache Spark 3.0.x | Apache Spark 3.1.x | Apache Spark 3.2.x | Apache Spark 3.3.x | Apache Spark 3.4.x | |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| @@ -271,7 +271,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github ## Databricks Support -Spark NLP 5.1.2 has been tested and is compatible with the following runtimes: +Spark NLP 5.1.3 has been tested and is compatible with the following runtimes: **CPU:** @@ -332,7 +332,7 @@ Spark NLP 5.1.2 has been tested and is compatible with the following runtimes: ## EMR Support -Spark NLP 5.1.2 has been tested and is compatible with the following EMR releases: +Spark NLP 5.1.3 has been tested and is compatible with the following EMR releases: - emr-6.2.0 - emr-6.3.0 @@ -377,11 +377,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x, ```sh # CPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 ``` The `spark-nlp` has been published to @@ -390,11 +390,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # GPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.3 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.3 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.1.3 ``` @@ -404,11 +404,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # AArch64 -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.3 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.3 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.1.3 ``` @@ -418,11 +418,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # M1/M2 (Apple Silicon) -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.3 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.3 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.1.3 ``` @@ -436,7 +436,7 @@ set in your SparkSession: spark-shell \ --driver-memory 16g \ --conf spark.kryoserializer.buffer.max=2000M \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 ``` ## Scala @@ -454,7 +454,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp_2.12 - 5.1.2 + 5.1.3 ``` @@ -465,7 +465,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-gpu_2.12 - 5.1.2 + 5.1.3 ``` @@ -476,7 +476,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-aarch64_2.12 - 5.1.2 + 5.1.3 ``` @@ -487,7 +487,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-silicon_2.12 - 5.1.2 + 5.1.3 ``` @@ -497,28 +497,28 @@ coordinates: ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.1.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.1.3" ``` **spark-nlp-gpu:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.1.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.1.3" ``` **spark-nlp-aarch64:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.1.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.1.3" ``` **spark-nlp-silicon:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.1.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.1.3" ``` Maven @@ -540,7 +540,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through Pip: ```bash -pip install spark-nlp==5.1.2 +pip install spark-nlp==5.1.3 ``` Conda: @@ -569,7 +569,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3") .getOrCreate() ``` @@ -640,7 +640,7 @@ Use either one of the following options - Add the following Maven Coordinates to the interpreter's library list ```bash -com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 +com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 ``` - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is @@ -651,7 +651,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 Apart from the previous step, install the python module through pip ```bash -pip install spark-nlp==5.1.2 +pip install spark-nlp==5.1.3 ``` Or you can install `spark-nlp` from inside Zeppelin by using Conda: @@ -679,7 +679,7 @@ launch the Jupyter from the same Python environment: $ conda create -n sparknlp python=3.8 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.1.2 pyspark==3.3.1 jupyter +$ pip install spark-nlp==5.1.3 pyspark==3.3.1 jupyter $ jupyter notebook ``` @@ -696,7 +696,7 @@ export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS=notebook -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 ``` Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp` @@ -723,7 +723,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.1.2 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.1.3 ``` [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) @@ -746,7 +746,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.1.2 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.1.3 ``` [Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live @@ -765,9 +765,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP 3. In `Libraries` tab inside your cluster you need to follow these steps: - 3.1. Install New -> PyPI -> `spark-nlp==5.1.2` -> Install + 3.1. Install New -> PyPI -> `spark-nlp==5.1.3` -> Install - 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2` -> Install + 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3` -> Install 4. Now you can attach your notebook to the cluster and use Spark NLP! @@ -818,7 +818,7 @@ A sample of your software configuration in JSON on S3 (must be public access): "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", - "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2" + "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3" } }] ``` @@ -827,7 +827,7 @@ A sample of AWS CLI to launch EMR cluster: ```.sh aws emr create-cluster \ ---name "Spark NLP 5.1.2" \ +--name "Spark NLP 5.1.3" \ --release-label emr-6.2.0 \ --applications Name=Hadoop Name=Spark Name=Hive \ --instance-type m4.4xlarge \ @@ -891,7 +891,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \ --enable-component-gateway \ --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \ --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \ - --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 + --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 ``` 2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI. @@ -930,7 +930,7 @@ spark = SparkSession.builder .config("spark.kryoserializer.buffer.max", "2000m") .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3") .getOrCreate() ``` @@ -944,7 +944,7 @@ spark-shell \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 ``` **pyspark:** @@ -957,7 +957,7 @@ pyspark \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.3 ``` **Databricks:** @@ -1229,7 +1229,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars", "/tmp/spark-nlp-assembly-5.1.2.jar") + .config("spark.jars", "/tmp/spark-nlp-assembly-5.1.3.jar") .getOrCreate() ``` @@ -1238,7 +1238,7 @@ spark = SparkSession.builder version (3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x) - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. ( - i.e., `hdfs:///tmp/spark-nlp-assembly-5.1.2.jar`) + i.e., `hdfs:///tmp/spark-nlp-assembly-5.1.3.jar`) Example of using pretrained Models and Pipelines in offline: diff --git a/build.sbt b/build.sbt index 53480b7d392517..09a29f16499a47 100644 --- a/build.sbt +++ b/build.sbt @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64) organization := "com.johnsnowlabs.nlp" -version := "5.1.2" +version := "5.1.3" (ThisBuild / scalaVersion) := scalaVer diff --git a/conda/meta.yaml b/conda/meta.yaml index 698f60613f7392..e9c4bdd8e5309f 100644 --- a/conda/meta.yaml +++ b/conda/meta.yaml @@ -1,5 +1,5 @@ {% set name = "spark-nlp" %} -{% set version = "5.1.2" %} +{% set version = "5.1.3" %} package: name: {{ name|lower }} @@ -7,7 +7,7 @@ package: source: url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/spark-nlp-{{ version }}.tar.gz - sha256: 33e5124228e6064577231d3aa87476ad3f38988ab679e2dbd91b4273b7f407c2 + sha256: a1d7230c6b4e3b23aa33ee96e68e295a56c7ddad61e6f764da4c4ac002fd7902 build: noarch: python diff --git a/docs/_layouts/landing.html b/docs/_layouts/landing.html index a7f916861e8dd8..f9be7fb96ee5ac 100755 --- a/docs/_layouts/landing.html +++ b/docs/_layouts/landing.html @@ -201,7 +201,7 @@

{{ _section.title }}

{% highlight bash %} # Using PyPI - $ pip install spark-nlp==5.1.2 + $ pip install spark-nlp==5.1.3 # Using Anaconda/Conda $ conda install -c johnsnowlabs spark-nlp diff --git a/docs/api/com/index.html b/docs/api/com/index.html index 1cd249a37e817c..2a4f24ec50d732 100644 --- a/docs/api/com/index.html +++ b/docs/api/com/index.html @@ -3,9 +3,9 @@ - Spark NLP 5.1.2 ScalaDoc - com - - + Spark NLP 5.1.3 ScalaDoc - com + + @@ -28,7 +28,7 @@