Skip to content

Commit

Permalink
2.2.0-rc3 Release Candidate
Browse files Browse the repository at this point in the history
  • Loading branch information
saif-ellafi committed Aug 20, 2019
1 parent 9ad7df5 commit d6ae43c
Show file tree
Hide file tree
Showing 10 changed files with 49 additions and 49 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
========
2.2.0-rc2
2.2.0-rc3
========
---------------
Overview
Expand Down
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Take a look at our official Spark NLP page: [http://nlp.johnsnowlabs.com/](http:

## Apache Spark Support

Spark NLP *2.2.0-rc2* has been built on top of Apache Spark 2.4.3
Spark NLP *2.2.0-rc3* has been built on top of Apache Spark 2.4.3

Note that pre-build Spark NLP is not retrocompatible with older Spark 2.x.x, so models and environments might not work.

Expand All @@ -67,18 +67,18 @@ This library has been uploaded to the [spark-packages repository](https://spark-

Benefit of spark-packages is that makes it available for both Scala-Java and Python

To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:2.2.0-rc2` to you spark command
To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:2.2.0-rc3` to you spark command

```sh
spark-shell --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
spark-shell --packages JohnSnowLabs:spark-nlp:2.2.0-rc3
```

```sh
pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc3
```

```sh
spark-submit --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
spark-submit --packages JohnSnowLabs:spark-nlp:2.2.0-rc3
```

This can also be used to create a SparkSession manually by using the `spark.jars.packages` option in both Python and Scala
Expand Down Expand Up @@ -146,7 +146,7 @@ Our package is deployed to maven central. In order to add this package as a depe
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.2.0-rc2</version>
<version>2.2.0-rc3</version>
</dependency>
```

Expand All @@ -157,22 +157,22 @@ and
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-ocr_2.11</artifactId>
<version>2.2.0-rc2</version>
<version>2.2.0-rc3</version>
</dependency>
```

### SBT

```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.2.0-rc2"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.2.0-rc3"
```

and

```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-ocr
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-ocr" % "2.2.0-rc2"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-ocr" % "2.2.0-rc3"
```

Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
Expand All @@ -188,7 +188,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
Pip:

```bash
pip install spark-nlp==2.2.0.rc2
pip install spark-nlp==2.2.0.rc3
```

Conda:
Expand All @@ -215,7 +215,7 @@ spark = SparkSession.builder \
.master("local[4]")\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc2")\
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc3")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
```
Expand Down Expand Up @@ -248,7 +248,7 @@ Use either one of the following options
* Add the following Maven Coordinates to the interpreter's library list

```bash
com.johnsnowlabs.nlp:spark-nlp_2.11:2.2.0-rc2
com.johnsnowlabs.nlp:spark-nlp_2.11:2.2.0-rc3
```

* Add path to pre-built jar from [here](#pre-compiled-spark-nlp-and-spark-nlp-ocr) in the interpreter's library list making sure the jar is available to driver path
Expand All @@ -258,7 +258,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.11:2.2.0-rc2
Apart from previous step, install python module through pip

```bash
pip install spark-nlp==2.2.0-rc2
pip install spark-nlp==2.2.0-rc3
```

Or you can install `spark-nlp` from inside Zeppelin by using Conda:
Expand All @@ -283,7 +283,7 @@ export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc3
```

Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
Expand Down Expand Up @@ -344,7 +344,7 @@ To include the OCR submodule in Spark NLP, you will need to add the following to

```
--repositories http://repo.spring.io/plugins-release
--packages JohnSnowLabs:spark-nlp:2.2.0-rc2,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc2,javax.media.jai:com.springsource.javax.media.jai.core:1.1.3
--packages JohnSnowLabs:spark-nlp:2.2.0-rc3,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc3,javax.media.jai:com.springsource.javax.media.jai.core:1.1.3
```

This way you will download the extra dependencies needed by our OCR submodule. The Python SparkSession equivalent is
Expand All @@ -356,7 +356,7 @@ spark = SparkSession.builder \
.config("spark.driver.memory", "6g") \
.config("spark.executor.memory", "6g") \
.config("spark.jars.repositories", "http://repo.spring.io/plugins-release") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc2,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc2,javax.media.jai:com.springsource.javax.media.jai.core:1.1.3") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc3,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc3,javax.media.jai:com.springsource.javax.media.jai.core:1.1.3") \
.getOrCreate()
```

Expand Down
6 changes: 3 additions & 3 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ if(is_gpu.equals("false")){

organization:= "com.johnsnowlabs.nlp"

version := "2.2.0-rc2"
version := "2.2.0-rc3"

scalaVersion in ThisBuild := scalaVer

Expand Down Expand Up @@ -200,7 +200,7 @@ assemblyMergeStrategy in assembly := {
lazy val evaluation = (project in file("eval"))
.settings(
name := "spark-nlp-eval",
version := "2.2.0-rc2",
version := "2.2.0-rc3",

assemblyMergeStrategy in assembly := evalMergeRules,

Expand Down Expand Up @@ -241,7 +241,7 @@ lazy val evaluation = (project in file("eval"))
lazy val ocr = (project in file("ocr"))
.settings(
name := "spark-nlp-ocr",
version := "2.2.0-rc2",
version := "2.2.0-rc3",

test in assembly := {},

Expand Down
10 changes: 5 additions & 5 deletions docs/_layouts/landing.html
Original file line number Diff line number Diff line change
Expand Up @@ -49,22 +49,22 @@ <h1>{{ _section.title }}</h1>
<div class="cell cell--12 cell--lg-12" style="text-align: left; background-color: #2d2d2d; padding: 10px">
{% highlight bash %}
# Install Spark NLP from PyPI
$ pip install spark-nlp==2.2.0.rc2
$ pip install spark-nlp==2.2.0.rc3

# Install Spark NLP from Anacodna/Conda
$ conda install -c johnsnowlabs spark-nlp

# Load Spark NLP with Spark Shell
$ spark-shell --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
$ spark-shell --packages JohnSnowLabs:spark-nlp:2.2.0-rc3

# Load Spark NLP with PySpark
$ pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
$ pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc3

# Load Spark NLP with Spark Submit
$ spark-submit --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
$ spark-submit --packages JohnSnowLabs:spark-nlp:2.2.0-rc3

# Load Spark NLP as external JAR after comiling and bulding Spark NLP by `sbt assembly`
$ spark-shell --jar spark-nlp-assembly-2.2.0-rc2
$ spark-shell --jar spark-nlp-assembly-2.2.0-rc3
{% endhighlight %}
</div>
</div>
Expand Down
14 changes: 7 additions & 7 deletions docs/en/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ modify_date: "2019-05-16"
If you installed pyspark through pip, you can install `spark-nlp` through pip as well.

```bash
pip install spark-nlp==2.2.0.rc2
pip install spark-nlp==2.2.0.rc3
```

PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/)
Expand All @@ -36,7 +36,7 @@ spark = SparkSession.builder \
.master("local[*]")\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc2")\
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc3")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
```
Expand Down Expand Up @@ -97,7 +97,7 @@ Our package is deployed to maven central. In order to add this package as a depe
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.2.0-rc2</version>
<version>2.2.0-rc3</version>
</dependency>
```

Expand All @@ -108,22 +108,22 @@ and
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-ocr_2.11</artifactId>
<version>2.2.0-rc2</version>
<version>2.2.0-rc3</version>
</dependency>
```

### SBT

```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.2.0-rc2"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.2.0-rc3"
```

and

```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-ocr
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-ocr" % "2.2.0-rc2"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-ocr" % "2.2.0-rc3"
```

Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
Expand Down Expand Up @@ -151,7 +151,7 @@ Note: You can import these notebooks by using their URLs.
4- From the Source drop-down menu, select **Maven Coordinate:**
![Databricks](https://databricks.com/wp-content/uploads/2015/07/select-maven-1024x711.png)

5- Now, all available **Spark Packages** are at your fingertips! Just search for **JohnSnowLabs:spark-nlp:version** where **version** stands for the library version such as: `1.8.4` or `2.2.0-rc2`
5- Now, all available **Spark Packages** are at your fingertips! Just search for **JohnSnowLabs:spark-nlp:version** where **version** stands for the library version such as: `1.8.4` or `2.2.0-rc3`
![Databricks](https://databricks.com/wp-content/uploads/2015/07/browser-1024x548.png)

6- Select **spark-nlp** package and we are good to go!
Expand Down
18 changes: 9 additions & 9 deletions docs/en/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,17 @@ Spark NLP is built on top of **Apache Spark 2.4.0** and such is the **only** sup
To start using the library, execute any of the following lines depending on your desired use case:

```bash
spark-shell --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
spark-submit --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
spark-shell --packages JohnSnowLabs:spark-nlp:2.2.0-rc3
pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc3
spark-submit --packages JohnSnowLabs:spark-nlp:2.2.0-rc3
```

### **Straight forward Python on jupyter notebook**

Use pip to install (after you pip installed numpy and pyspark)

```bash
pip install spark-nlp==2.2.0.rc2
pip install spark-nlp==2.2.0.rc3
jupyter notebook
```

Expand All @@ -60,7 +60,7 @@ spark = SparkSession.builder \
.appName('OCR Eval') \
.config("spark.driver.memory", "6g") \
.config("spark.executor.memory", "6g") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc2") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc3") \
.getOrCreate()
```

Expand All @@ -69,13 +69,13 @@ spark = SparkSession.builder \
Add the following maven coordinates in the dependency configuration page:

```bash
com.johnsnowlabs.nlp:spark-nlp_2.11:2.2.0-rc2
com.johnsnowlabs.nlp:spark-nlp_2.11:2.2.0-rc3
```

For Python in **Apache Zeppelin** you may need to setup _**SPARK_SUBMIT_OPTIONS**_ utilizing --packages instruction shown above like this

```bash
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:2.2.0-rc2"
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:2.2.0-rc3"
```

### **Python Jupyter Notebook with PySpark**
Expand All @@ -85,7 +85,7 @@ export SPARK_HOME=/path/to/your/spark/folder
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc2
pyspark --packages JohnSnowLabs:spark-nlp:2.2.0-rc3
```

### S3 based standalone cluster (No Hadoop)
Expand Down Expand Up @@ -297,7 +297,7 @@ lightPipeline.annotate("Hello world, please annotate my text")
Spark NLP OCR Module is not included within Spark NLP. It is not an annotator and not an extension to Spark ML. You can include it with the following coordinates for Maven:

```bash
com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc2
com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc3
```

### Creating Spark datasets from PDF (To be used with Spark NLP)
Expand Down
2 changes: 1 addition & 1 deletion python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
# For a discussion on single-sourcing the version across setup.py and the
# project code, see
# https://packaging.python.org/en/latest/single_source_version.html
version='2.2.0.rc2', # Required
version='2.2.0.rc3', # Required

# This is a one-line description or tagline of what your project does. This
# corresponds to the "Summary" metadata field:
Expand Down
6 changes: 3 additions & 3 deletions python/sparknlp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,14 @@ def start(include_ocr=False):

if include_ocr:
builder \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc2,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc2,javax.media.jai:com.springsource.javax.media.jai.core:1.1.3") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc3,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc3,javax.media.jai:com.springsource.javax.media.jai.core:1.1.3") \
.config("spark.jars.repositories", "http://repo.spring.io/plugins-release")

else:
builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc2") \
builder.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc3") \

return builder.getOrCreate()


def version():
print('2.2.0-rc2')
print('2.2.0-rc3')
6 changes: 3 additions & 3 deletions src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import org.apache.spark.sql.SparkSession

object SparkNLP {

val currentVersion = "2.2.0-rc2"
val currentVersion = "2.2.0-rc3"

def start(includeOcr: Boolean = false): SparkSession = {
val build = SparkSession.builder()
Expand All @@ -15,11 +15,11 @@ object SparkNLP {

if (includeOcr) {
build
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc2,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc2,javax.media.jai:com.springsource.javax.media.jai.core:1.1.3")
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc3,com.johnsnowlabs.nlp:spark-nlp-ocr_2.11:2.2.0-rc3,javax.media.jai:com.springsource.javax.media.jai.core:1.1.3")
.config("spark.jars.repositories", "http://repo.spring.io/plugins-release")
} else {
build
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc2")
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.2.0-rc3")
}

build.getOrCreate()
Expand Down
2 changes: 1 addition & 1 deletion src/main/scala/com/johnsnowlabs/util/Build.scala
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ object Build {
if (version != null && version.nonEmpty)
version
else
"2.2.0-rc2"
"2.2.0-rc3"
}
}

0 comments on commit d6ae43c

Please sign in to comment.