Merge branch 'master' of https://github.com/JohnSnowLabs/spark-nlp

JohnSnowLabs · Mar 4, 2020 · cdfd954 · cdfd954
2 parents ddd4630 + b23e6ed
commit cdfd954
Show file tree

Hide file tree

Showing 19 changed files with 245 additions and 218 deletions.
diff --git a/README.md b/README.md
@@ -109,7 +109,7 @@ For more examples you can visit our dedicated [repository](https://github.com/Jo
 
 ## Apache Spark Support
 
-Spark NLP *2.4.1* has been built on top of Apache Spark 2.4.4
+Spark NLP *2.4.2* has been built on top of Apache Spark 2.4.4
 
 | Spark NLP   |   Apache Spark 2.3.x  | Apache Spark 2.4.x |
 |-------------|-----------------------|--------------------|
@@ -133,23 +133,23 @@ This library has been uploaded to the [spark-packages repository](https://spark-
 
 Benefit of spark-packages is that makes it available for both Scala-Java and Python
 
-To use the most recent version just add the `--packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1` to you spark command
+To use the most recent version just add the `--packages com.johnsnowlabs.nlp:spark-nlp_2.11:` to you spark command
 
 ```sh
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 ```
 
 ```sh
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 ```
 
 ```sh
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 ```
 
 This can also be used to create a SparkSession manually by using the `spark.jars.packages` option in both Python and Scala.
 
-**NOTE**: To ues SPark NLP with GPU you can ues dedicated package for GPU `com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.4.1`
+**NOTE**: To ues SPark NLP with GPU you can ues dedicated package for GPU `com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.4.2`
 
 ## Scala
 
@@ -164,7 +164,7 @@ Our package is deployed to maven central. In order to add this package as a depe
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
     <artifactId>spark-nlp_2.11</artifactId>
-    <version>2.4.1</version>
+    <version>2.4.2</version>
 </dependency>
 ```
 
@@ -175,7 +175,7 @@ Our package is deployed to maven central. In order to add this package as a depe
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
     <artifactId>spark-nlp-gpu_2.11</artifactId>
-    <version>2.4.1</version>
+    <version>2.4.2</version>
 </dependency>
 ```
 
@@ -185,14 +185,14 @@ Our package is deployed to maven central. In order to add this package as a depe
 
 ```sbtshell
 // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.4.1"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.4.2"
 ```
 
 **spark-nlp-gpu:**
 
 ```sbtshell
 // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "2.4.1"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "2.4.2"
 ```
 
 Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
@@ -208,7 +208,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
 Pip:
 
 ```bash
-pip install spark-nlp==2.4.1
+pip install spark-nlp==2.4.2
 ```
 
 Conda:
@@ -235,7 +235,7 @@ spark = SparkSession.builder \
     .master("local[4]")\
     .config("spark.driver.memory","8G")\
     .config("spark.driver.maxResultSize", "2G") \
-    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1")\
+    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2")\
     .config("spark.kryoserializer.buffer.max", "500m")\
     .getOrCreate()
 ```
@@ -304,7 +304,7 @@ Use either one of the following options
 * Add the following Maven Coordinates to the interpreter's library list
 
 ```bash
-com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 ```
 
 * Add path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is available to driver path
@@ -314,7 +314,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
 Apart from previous step, install python module through pip
 
 ```bash
-pip install spark-nlp==2.4.1
+pip install spark-nlp==2.4.2
 ```
 
 Or you can install `spark-nlp` from inside Zeppelin by using Conda:
@@ -339,7 +339,7 @@ export PYSPARK_PYTHON=python3
 export PYSPARK_DRIVER_PYTHON=jupyter
 export PYSPARK_DRIVER_PYTHON_OPTS=notebook
 
-pyspark --packages JohnSnowLabs:spark-nlp:2.4.1
+pyspark --packages JohnSnowLabs:spark-nlp:2.4.2
 ```
 
 Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
@@ -365,7 +365,7 @@ os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
 ! pip install --ignore-installed pyspark==2.4.4
 
 # Install Spark NLP
-! pip install --ignore-installed spark-nlp==2.4.1
+! pip install --ignore-installed spark-nlp==2.4.2
 
 # Quick SparkSession start
 import sparknlp
@@ -554,7 +554,7 @@ If you get this common python error, it means that the Spark NLP was not loaded
 3. If on Windows, download Hadoop winutils.exe and add it to your PATH: https://github.com/steveloughran/winutils
 4. HADOOP_HOME should also be set in some cases, pointing to your SPARK_HOME should work if you don't have an explicit hadoop installation
 5. If you are running `pyspark` instead of just `jupyter notebook`, make sure you setup `PYSPARK_DRIVER_PYTHON`, `PYSPARK_DRIVER_PYTHON_OPTS` and `PYSPARK_PYTHON` as pointed in the documentation
-6. `pip install spark-nlp==2.4.1` even if you are using `--packages` as a safety instruction
+6. `pip install spark-nlp==2.4.2` even if you are using `--packages` as a safety instruction
 7. Make sure all dependencies are properly written and/or paths to any jars you are manually providing. Spark does not fail upon wrong path, it will just ignore it
 8. If you get dependency failures when starting Spark, make sure to add antivirus and firewall exceptions. Windows antivirus adversely impacts performance when resolving dependencies.
 

diff --git a/build.sbt b/build.sbt
@@ -15,7 +15,7 @@ if(is_gpu.equals("false")){
 
 organization:= "com.johnsnowlabs.nlp"
 
-version := "2.4.1"
+version := "2.4.2"
 
 scalaVersion in ThisBuild := scalaVer
 

diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock
@@ -220,7 +220,7 @@ GEM
     ruby-enum (0.7.2)
       i18n
     ruby_dep (1.5.0)
-    rubyzip (2.2.0)
+    rubyzip (1.3.0)
     safe_yaml (1.0.5)
     sass (3.7.4)
       sass-listen (~> 4.0.0)

diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml
@@ -57,6 +57,8 @@ docs-en:
         url:    /docs/en/evaluation
       - title:  Spark OCR
         url:  /docs/en/ocr
+      - title:  Spark OCR release notes
+        url:  /docs/en/ocr_release_notes
 
 extras:
   - title:     Extras

diff --git a/docs/_layouts/landing.html b/docs/_layouts/landing.html
@@ -49,22 +49,22 @@ <h1>{{ _section.title }}</h1>
           <div class="cell cell--12 cell--lg-12" style="text-align: left; background-color: #2d2d2d; padding: 10px">
             {% highlight bash %}
 # Install Spark NLP from PyPI
-$ pip install spark-nlp==2.4.1
+$ pip install spark-nlp==2.4.2
 
 # Install Spark NLP from Anaconda/Conda
 $ conda install -c johnsnowlabs spark-nlp
 
 # Load Spark NLP with Spark Shell
-$ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+$ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 
 # Load Spark NLP with PySpark
-$ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+$ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 
 # Load Spark NLP with Spark Submit
-$ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+$ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 
 # Load Spark NLP as external JAR after comiling and bulding Spark NLP by `sbt assembly`
-$ spark-shell --jar spark-nlp-assembly-2.4.1
+$ spark-shell --jar spark-nlp-assembly-2.4.2
             {% endhighlight %}
           </div>
         </div>

diff --git a/docs/en/concepts.md b/docs/en/concepts.md
@@ -41,20 +41,20 @@ Both forms of annotators can be included in a Pipeline and will automatically go
 
 ## Quickly annotate some text
 
-You can run these examples using Python or Scala. 
+You can run these examples using Python or Scala.
 
 The easiest way to run the python examples is by starting a pyspark
 jupyter notebook including the spark-nlp package:
 
 ```bash
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 ```
 
 The easiest way of running these scala examples is by starting a
 spark-shell session including the spark-nlp package:
 
 ```bash
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 ```
 
 ### Explain Document ML

diff --git a/docs/en/install.md b/docs/en/install.md
@@ -10,22 +10,22 @@ modify_date: "2020-02-20"
 
 ```bash
 # Install Spark NLP from PyPI
-$pip install spark-nlp==2.4.1
+$pip install spark-nlp==2.4.2
 
 # Install Spark NLP from Anacodna/Conda
 conda install -c johnsnowlabs spark-nlp
 
 # Load Spark NLP with Spark Shell
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 
 # Load Spark NLP with PySpark
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 
 # Load Spark NLP with Spark Submit
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
 
 # Load Spark NLP as external JAR after comiling and bulding Spark NLP by `sbt assembly`
-spark-shell --jar spark-nlp-assembly-2.4.1
+spark-shell --jar spark-nlp-assembly-2.4.2
 ```
 
 ## Python
@@ -106,7 +106,7 @@ conda install pyspark=2.4.4
 To install Spark NLP Opensource version you can just run:
 
 ```bash
-pip install --ignore-installed spark-nlp==2.4.1
+pip install --ignore-installed spark-nlp==2.4.2
 ```
 
 The --ignore-installed parameter is to overwrite your previous pip
@@ -116,7 +116,7 @@ package version if already installed.
 If you are using Anaconda/Conda for managing Python packages, you can install Spark NLP Opensource as follow:
 
 ```bash
-conda install -c johnsnowlabs spark-nlp=2.4.1
+conda install -c johnsnowlabs spark-nlp=2.4.2
 ```
 
 ### Install Licensed Spark NLP
@@ -125,7 +125,7 @@ You can also install the licensed package with extra functionalities and
 pretrained models by using:
 
 ```bash
-pip install spark-nlp-jsl==2.4.1 --extra-index-url #### --ignore-installed
+pip install spark-nlp-jsl==2.4.2 --extra-index-url #### --ignore-installed
 ```
 
 The #### is a secret url only avaliable for users with license, if you
@@ -135,7 +135,7 @@ At the moment there is no conda package for Licensed Spark NLP version.
 
 ### Setup AWS-CLI Credentials for licensed pretrained models
 
-From Licensed version 2.4.1 in order to access private JohnSnowLabs
+From Licensed version 2.4.2 in order to access private JohnSnowLabs
 models repository you need first to setup your AWS credentials. This
 access is done via Amazon aws command line interface (AWSCLI).
 
@@ -164,7 +164,7 @@ spark = SparkSession.builder \
     .master("local[*]")\
     .config("spark.driver.memory","8G")\
     .config("spark.driver.maxResultSize", "2G") \
-    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1")\
+    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2")\
     .config("spark.kryoserializer.buffer.max", "500m")\
     .getOrCreate()
 ```
@@ -182,7 +182,7 @@ pyspark this cell is just ignored.
 Initializing the spark session takes some seconds (usually less than 1
 minute) as the jar from the server needs to be loaded.
 
-We will be using version 2.4.1 of Spark NLP Open Source and 2.4.1 of
+We will be using version 2.4.2 of Spark NLP Open Source and 2.4.2 of
 Spark NLP Enterprise Edition.
 
 The #### in .config("spark.jars", "####") is a secret code, if you have
@@ -192,11 +192,11 @@ not received it please contact us at info@johnsnowlabs.com.
 from pyspark.sql import SparkSession
 
 spark = SparkSession.builder \
-    .appName("Global DEMO - Spark NLP Enterprise 2.4.1") \
+    .appName("Global DEMO - Spark NLP Enterprise 2.4.2") \
     .master("local[*]") \
     .config("spark.driver.memory","4G") \
     .config("spark.driver.maxResultSize", "2G") \
-    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1") \
+    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2") \
     .config("spark.jars", "####") \
     .getOrCreate()
 ```
@@ -213,7 +213,7 @@ as a dependency in your application:
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
     <artifactId>spark-nlp_2.11</artifactId>
-    <version>2.4.1</version>
+    <version>2.4.2</version>
 </dependency>
 ```
 
@@ -224,22 +224,22 @@ and
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
     <artifactId>spark-nlp-gpu_2.11</artifactId>
-    <version>2.4.1</version>
+    <version>2.4.2</version>
 </dependency>
 ```
 
 ### SBT
 
 ```bash
 // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.4.1"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.4.2"
 ```
 
 and
 
 ```bash
-// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-ocr
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "2.4.1"
+// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "2.4.2"
 ```
 
 Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
@@ -267,7 +267,7 @@ Note: You can import these notebooks by using their URLs.
 4- From the Source drop-down menu, select **Maven Coordinate:**
 ![Databricks](https://databricks.com/wp-content/uploads/2015/07/select-maven-1024x711.png)
 
-5- Now, all available **Spark Packages** are at your fingertips! Just search for **JohnSnowLabs:spark-nlp:version** where **version** stands for the library version such as: `2.4.1`
+5- Now, all available **Spark Packages** are at your fingertips! Just search for **JohnSnowLabs:spark-nlp:version** where **version** stands for the library version such as: `2.4.2`
 ![Databricks](https://databricks.com/wp-content/uploads/2015/07/browser-1024x548.png)
 
 6- Select **spark-nlp** package and we are good to go!