diff --git a/CHANGELOG b/CHANGELOG
index 44fe5ad74c7851..54b2672ec857b1 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,3 +1,13 @@
+========
+4.2.8
+========
+----------------
+Bug Fixes & Enhancements
+----------------
+* Fix the issue with optional keys (labels) in metadata when using XXXForSequenceClassitication annotators. This fixes `Some(neg) -> 0.13602075` as `neg -> 0.13602075` to be in harmony with all the other classifiers. https://github.com/JohnSnowLabs/spark-nlp/pull/13396
+* Introducing a config to skip `LightPipeline` validation for `inputCols` on the Python side for projects depending on Spark NLP. This toggle should only be used for specific annotators that do not follow the convention of predefined `inputAnnotatorTypes` and `outputAnnotatorType`.
+
+
========
4.2.7
========
diff --git a/README.md b/README.md
index a8908a97c12dcd..ee8b7fa37bf8dc 100644
--- a/README.md
+++ b/README.md
@@ -152,7 +152,7 @@ To use Spark NLP you need the following requirements:
**GPU (optional):**
-Spark NLP 4.2.7 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
+Spark NLP 4.2.8 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
- NVIDIA® GPU drivers version 450.80.02 or higher
- CUDA® Toolkit 11.2
@@ -168,7 +168,7 @@ $ java -version
$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==4.2.7 pyspark==3.2.3
+$ pip install spark-nlp==4.2.8 pyspark==3.2.3
```
In Python console or Jupyter `Python3` kernel:
@@ -213,7 +213,7 @@ For more examples, you can visit our dedicated [repository](https://github.com/J
## Apache Spark Support
-Spark NLP *4.2.7* has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
+Spark NLP *4.2.8* has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
| Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x | Apache Spark 3.0.x | Apache Spark 3.1.x | Apache Spark 3.2.x | Apache Spark 3.3.x |
|-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
@@ -247,7 +247,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
## Databricks Support
-Spark NLP 4.2.7 has been tested and is compatible with the following runtimes:
+Spark NLP 4.2.8 has been tested and is compatible with the following runtimes:
**CPU:**
@@ -291,7 +291,7 @@ NOTE: Spark NLP 4.0.x is based on TensorFlow 2.7.x which is compatible with CUDA
## EMR Support
-Spark NLP 4.2.7 has been tested and is compatible with the following EMR releases:
+Spark NLP 4.2.8 has been tested and is compatible with the following EMR releases:
- emr-6.2.0
- emr-6.3.0
@@ -329,11 +329,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
```sh
# CPU
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
```
The `spark-nlp` has been published to the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp).
@@ -341,11 +341,11 @@ The `spark-nlp` has been published to the [Maven Repository](https://mvnreposito
```sh
# GPU
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.8
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.8
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.8
```
@@ -354,11 +354,11 @@ The `spark-nlp-gpu` has been published to the [Maven Repository](https://mvnrepo
```sh
# AArch64
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.8
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.8
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.8
```
@@ -367,11 +367,11 @@ The `spark-nlp-aarch64` has been published to the [Maven Repository](https://mvn
```sh
# M1
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.8
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.8
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.8
```
@@ -383,7 +383,7 @@ The `spark-nlp-m1` has been published to the [Maven Repository](https://mvnrepos
spark-shell \
--driver-memory 16g \
--conf spark.kryoserializer.buffer.max=2000M \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
```
## Scala
@@ -399,7 +399,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
com.johnsnowlabs.nlp
spark-nlp_2.12
- 4.2.7
+ 4.2.8
```
@@ -410,7 +410,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
com.johnsnowlabs.nlp
spark-nlp-gpu_2.12
- 4.2.7
+ 4.2.8
```
@@ -421,7 +421,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
com.johnsnowlabs.nlp
spark-nlp-aarch64_2.12
- 4.2.7
+ 4.2.8
```
@@ -432,7 +432,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
com.johnsnowlabs.nlp
spark-nlp-m1_2.12
- 4.2.7
+ 4.2.8
```
@@ -442,28 +442,28 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.2.7"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.2.8"
```
**spark-nlp-gpu:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.2.7"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.2.8"
```
**spark-nlp-aarch64:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.2.7"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.2.8"
```
**spark-nlp-m1:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.2.7"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.2.8"
```
Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
@@ -483,7 +483,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
Pip:
```bash
-pip install spark-nlp==4.2.7
+pip install spark-nlp==4.2.8
```
Conda:
@@ -511,7 +511,7 @@ spark = SparkSession.builder \
.config("spark.driver.memory","16G")\
.config("spark.driver.maxResultSize", "0") \
.config("spark.kryoserializer.buffer.max", "2000M")\
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7")\
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8")\
.getOrCreate()
```
@@ -579,7 +579,7 @@ Use either one of the following options
- Add the following Maven Coordinates to the interpreter's library list
```bash
-com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
```
- Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is available to driver path
@@ -589,7 +589,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
Apart from the previous step, install the python module through pip
```bash
-pip install spark-nlp==4.2.7
+pip install spark-nlp==4.2.8
```
Or you can install `spark-nlp` from inside Zeppelin by using Conda:
@@ -614,7 +614,7 @@ The easiest way to get this done on Linux and macOS is to simply install `spark-
$ conda create -n sparknlp python=3.8 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==4.2.7 pyspark==3.2.3 jupyter
+$ pip install spark-nlp==4.2.8 pyspark==3.2.3 jupyter
$ jupyter notebook
```
@@ -630,7 +630,7 @@ export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
```
Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
@@ -655,7 +655,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
# -s is for spark-nlp
# -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
# by default they are set to the latest
-!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.7
+!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.8
```
[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start_google_colab.ipynb) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines.
@@ -676,7 +676,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
# -s is for spark-nlp
# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
# by default they are set to the latest
-!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.7
+!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.8
```
[Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP pretrained pipeline.
@@ -694,9 +694,9 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
3. In `Libraries` tab inside your cluster you need to follow these steps:
- 3.1. Install New -> PyPI -> `spark-nlp==4.2.7` -> Install
+ 3.1. Install New -> PyPI -> `spark-nlp==4.2.8` -> Install
- 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7` -> Install
+ 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8` -> Install
4. Now you can attach your notebook to the cluster and use Spark NLP!
@@ -744,7 +744,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
"spark.kryoserializer.buffer.max": "2000M",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.driver.maxResultSize": "0",
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7"
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8"
}
}]
```
@@ -753,7 +753,7 @@ A sample of AWS CLI to launch EMR cluster:
```.sh
aws emr create-cluster \
---name "Spark NLP 4.2.7" \
+--name "Spark NLP 4.2.8" \
--release-label emr-6.2.0 \
--applications Name=Hadoop Name=Spark Name=Hive \
--instance-type m4.4xlarge \
@@ -817,7 +817,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
--enable-component-gateway \
--metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
- --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+ --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
```
2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -856,7 +856,7 @@ spark = SparkSession.builder \
.config("spark.kryoserializer.buffer.max", "2000m") \
.config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") \
.config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") \
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7") \
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8") \
.getOrCreate()
```
@@ -870,7 +870,7 @@ spark-shell \
--conf spark.kryoserializer.buffer.max=2000M \
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
```
**pyspark:**
@@ -883,7 +883,7 @@ pyspark \
--conf spark.kryoserializer.buffer.max=2000M \
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
```
**Databricks:**
@@ -1147,12 +1147,12 @@ spark = SparkSession.builder \
.config("spark.driver.memory","16G")\
.config("spark.driver.maxResultSize", "0") \
.config("spark.kryoserializer.buffer.max", "2000M")\
- .config("spark.jars", "/tmp/spark-nlp-assembly-4.2.7.jar")\
+ .config("spark.jars", "/tmp/spark-nlp-assembly-4.2.8.jar")\
.getOrCreate()
```
- You can download provided Fat JARs from each [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases), please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark version (3.0.x, 3.1.x, 3.2.x, and 3.3.x)
-- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.7.jar`)
+- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.8.jar`)
Example of using pretrained Models and Pipelines in offline:
diff --git a/build.sbt b/build.sbt
index 3f04c3e73bc281..c9d78fbd79b942 100644
--- a/build.sbt
+++ b/build.sbt
@@ -6,7 +6,7 @@ name := getPackageName(is_m1, is_gpu, is_aarch64)
organization := "com.johnsnowlabs.nlp"
-version := "4.2.7"
+version := "4.2.8"
(ThisBuild / scalaVersion) := scalaVer
diff --git a/conda/meta.yaml b/conda/meta.yaml
index 08b910b87acadd..5a0093c36adbd3 100644
--- a/conda/meta.yaml
+++ b/conda/meta.yaml
@@ -1,15 +1,15 @@
package:
name: "spark-nlp"
- version: 4.2.7
+ version: 4.2.8
app:
entry: spark-nlp
summary: Natural Language Understanding Library for Apache Spark.
source:
- fn: spark-nlp-4.2.7.tar.gz
- url: https://files.pythonhosted.org/packages/1d/e0/c123346f12e9d312c0b6bfecbd96db9e899882e01bc1adb338349d9e1088/spark-nlp-4.2.7.tar.gz
- sha256: 071f5b06ae10319cffe5a4fa22586a5b269800578e8a74de912abf123fd01bdf
+ fn: spark-nlp-4.2.8.tar.gz
+ url: https://files.pythonhosted.org/packages/5a/af/9c73a6a6a74f2848209001194bef19b74cfe04fdd070aec529d290ce239d/spark-nlp-4.2.8.tar.gz
+ sha256: 0573d006538808fd46a102f7efc79c6a7a37d68800e1b2cbf0607d0128a724f1
build:
noarch: generic
number: 0
diff --git a/docs/_includes/docs-healthcare-pagination.html b/docs/_includes/docs-healthcare-pagination.html
index 834b60f481577c..5079faa845f660 100644
--- a/docs/_includes/docs-healthcare-pagination.html
+++ b/docs/_includes/docs-healthcare-pagination.html
@@ -10,7 +10,7 @@