Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
maziyarpanahi committed Mar 4, 2020
2 parents ddd4630 + b23e6ed commit cdfd954
Show file tree
Hide file tree
Showing 19 changed files with 245 additions and 218 deletions.
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ For more examples you can visit our dedicated [repository](https://github.com/Jo

## Apache Spark Support

Spark NLP *2.4.1* has been built on top of Apache Spark 2.4.4
Spark NLP *2.4.2* has been built on top of Apache Spark 2.4.4

| Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x |
|-------------|-----------------------|--------------------|
Expand All @@ -133,23 +133,23 @@ This library has been uploaded to the [spark-packages repository](https://spark-

Benefit of spark-packages is that makes it available for both Scala-Java and Python

To use the most recent version just add the `--packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1` to you spark command
To use the most recent version just add the `--packages com.johnsnowlabs.nlp:spark-nlp_2.11:` to you spark command

```sh
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
```

```sh
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
```

```sh
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
```

This can also be used to create a SparkSession manually by using the `spark.jars.packages` option in both Python and Scala.

**NOTE**: To ues SPark NLP with GPU you can ues dedicated package for GPU `com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.4.1`
**NOTE**: To ues SPark NLP with GPU you can ues dedicated package for GPU `com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.4.2`

## Scala

Expand All @@ -164,7 +164,7 @@ Our package is deployed to maven central. In order to add this package as a depe
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.4.1</version>
<version>2.4.2</version>
</dependency>
```

Expand All @@ -175,7 +175,7 @@ Our package is deployed to maven central. In order to add this package as a depe
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.11</artifactId>
<version>2.4.1</version>
<version>2.4.2</version>
</dependency>
```

Expand All @@ -185,14 +185,14 @@ Our package is deployed to maven central. In order to add this package as a depe

```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.4.1"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.4.2"
```

**spark-nlp-gpu:**

```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "2.4.1"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "2.4.2"
```

Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
Expand All @@ -208,7 +208,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
Pip:

```bash
pip install spark-nlp==2.4.1
pip install spark-nlp==2.4.2
```

Conda:
Expand All @@ -235,7 +235,7 @@ spark = SparkSession.builder \
.master("local[4]")\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1")\
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
```
Expand Down Expand Up @@ -304,7 +304,7 @@ Use either one of the following options
* Add the following Maven Coordinates to the interpreter's library list

```bash
com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
```

* Add path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is available to driver path
Expand All @@ -314,7 +314,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
Apart from previous step, install python module through pip

```bash
pip install spark-nlp==2.4.1
pip install spark-nlp==2.4.2
```

Or you can install `spark-nlp` from inside Zeppelin by using Conda:
Expand All @@ -339,7 +339,7 @@ export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

pyspark --packages JohnSnowLabs:spark-nlp:2.4.1
pyspark --packages JohnSnowLabs:spark-nlp:2.4.2
```

Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
Expand All @@ -365,7 +365,7 @@ os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! pip install --ignore-installed pyspark==2.4.4

# Install Spark NLP
! pip install --ignore-installed spark-nlp==2.4.1
! pip install --ignore-installed spark-nlp==2.4.2

# Quick SparkSession start
import sparknlp
Expand Down Expand Up @@ -554,7 +554,7 @@ If you get this common python error, it means that the Spark NLP was not loaded
3. If on Windows, download Hadoop winutils.exe and add it to your PATH: https://github.com/steveloughran/winutils
4. HADOOP_HOME should also be set in some cases, pointing to your SPARK_HOME should work if you don't have an explicit hadoop installation
5. If you are running `pyspark` instead of just `jupyter notebook`, make sure you setup `PYSPARK_DRIVER_PYTHON`, `PYSPARK_DRIVER_PYTHON_OPTS` and `PYSPARK_PYTHON` as pointed in the documentation
6. `pip install spark-nlp==2.4.1` even if you are using `--packages` as a safety instruction
6. `pip install spark-nlp==2.4.2` even if you are using `--packages` as a safety instruction
7. Make sure all dependencies are properly written and/or paths to any jars you are manually providing. Spark does not fail upon wrong path, it will just ignore it
8. If you get dependency failures when starting Spark, make sure to add antivirus and firewall exceptions. Windows antivirus adversely impacts performance when resolving dependencies.

Expand Down
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ if(is_gpu.equals("false")){

organization:= "com.johnsnowlabs.nlp"

version := "2.4.1"
version := "2.4.2"

scalaVersion in ThisBuild := scalaVer

Expand Down
2 changes: 1 addition & 1 deletion docs/Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ GEM
ruby-enum (0.7.2)
i18n
ruby_dep (1.5.0)
rubyzip (2.2.0)
rubyzip (1.3.0)
safe_yaml (1.0.5)
sass (3.7.4)
sass-listen (~> 4.0.0)
Expand Down
2 changes: 2 additions & 0 deletions docs/_data/navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ docs-en:
url: /docs/en/evaluation
- title: Spark OCR
url: /docs/en/ocr
- title: Spark OCR release notes
url: /docs/en/ocr_release_notes

extras:
- title: Extras
Expand Down
10 changes: 5 additions & 5 deletions docs/_layouts/landing.html
Original file line number Diff line number Diff line change
Expand Up @@ -49,22 +49,22 @@ <h1>{{ _section.title }}</h1>
<div class="cell cell--12 cell--lg-12" style="text-align: left; background-color: #2d2d2d; padding: 10px">
{% highlight bash %}
# Install Spark NLP from PyPI
$ pip install spark-nlp==2.4.1
$ pip install spark-nlp==2.4.2

# Install Spark NLP from Anaconda/Conda
$ conda install -c johnsnowlabs spark-nlp

# Load Spark NLP with Spark Shell
$ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
$ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2

# Load Spark NLP with PySpark
$ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
$ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2

# Load Spark NLP with Spark Submit
$ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
$ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2

# Load Spark NLP as external JAR after comiling and bulding Spark NLP by `sbt assembly`
$ spark-shell --jar spark-nlp-assembly-2.4.1
$ spark-shell --jar spark-nlp-assembly-2.4.2
{% endhighlight %}
</div>
</div>
Expand Down
6 changes: 3 additions & 3 deletions docs/en/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,20 +41,20 @@ Both forms of annotators can be included in a Pipeline and will automatically go

## Quickly annotate some text

You can run these examples using Python or Scala.
You can run these examples using Python or Scala.

The easiest way to run the python examples is by starting a pyspark
jupyter notebook including the spark-nlp package:

```bash
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
```

The easiest way of running these scala examples is by starting a
spark-shell session including the spark-nlp package:

```bash
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2
```

### Explain Document ML
Expand Down
38 changes: 19 additions & 19 deletions docs/en/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,22 @@ modify_date: "2020-02-20"

```bash
# Install Spark NLP from PyPI
$pip install spark-nlp==2.4.1
$pip install spark-nlp==2.4.2

# Install Spark NLP from Anacodna/Conda
conda install -c johnsnowlabs spark-nlp

# Load Spark NLP with Spark Shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2

# Load Spark NLP with PySpark
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2

# Load Spark NLP with Spark Submit
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2

# Load Spark NLP as external JAR after comiling and bulding Spark NLP by `sbt assembly`
spark-shell --jar spark-nlp-assembly-2.4.1
spark-shell --jar spark-nlp-assembly-2.4.2
```

## Python
Expand Down Expand Up @@ -106,7 +106,7 @@ conda install pyspark=2.4.4
To install Spark NLP Opensource version you can just run:

```bash
pip install --ignore-installed spark-nlp==2.4.1
pip install --ignore-installed spark-nlp==2.4.2
```

The --ignore-installed parameter is to overwrite your previous pip
Expand All @@ -116,7 +116,7 @@ package version if already installed.
If you are using Anaconda/Conda for managing Python packages, you can install Spark NLP Opensource as follow:

```bash
conda install -c johnsnowlabs spark-nlp=2.4.1
conda install -c johnsnowlabs spark-nlp=2.4.2
```

### Install Licensed Spark NLP
Expand All @@ -125,7 +125,7 @@ You can also install the licensed package with extra functionalities and
pretrained models by using:

```bash
pip install spark-nlp-jsl==2.4.1 --extra-index-url #### --ignore-installed
pip install spark-nlp-jsl==2.4.2 --extra-index-url #### --ignore-installed
```

The #### is a secret url only avaliable for users with license, if you
Expand All @@ -135,7 +135,7 @@ At the moment there is no conda package for Licensed Spark NLP version.

### Setup AWS-CLI Credentials for licensed pretrained models

From Licensed version 2.4.1 in order to access private JohnSnowLabs
From Licensed version 2.4.2 in order to access private JohnSnowLabs
models repository you need first to setup your AWS credentials. This
access is done via Amazon aws command line interface (AWSCLI).

Expand Down Expand Up @@ -164,7 +164,7 @@ spark = SparkSession.builder \
.master("local[*]")\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1")\
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
```
Expand All @@ -182,7 +182,7 @@ pyspark this cell is just ignored.
Initializing the spark session takes some seconds (usually less than 1
minute) as the jar from the server needs to be loaded.

We will be using version 2.4.1 of Spark NLP Open Source and 2.4.1 of
We will be using version 2.4.2 of Spark NLP Open Source and 2.4.2 of
Spark NLP Enterprise Edition.

The #### in .config("spark.jars", "####") is a secret code, if you have
Expand All @@ -192,11 +192,11 @@ not received it please contact us at info@johnsnowlabs.com.
from pyspark.sql import SparkSession

spark = SparkSession.builder \
.appName("Global DEMO - Spark NLP Enterprise 2.4.1") \
.appName("Global DEMO - Spark NLP Enterprise 2.4.2") \
.master("local[*]") \
.config("spark.driver.memory","4G") \
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.1") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.2") \
.config("spark.jars", "####") \
.getOrCreate()
```
Expand All @@ -213,7 +213,7 @@ as a dependency in your application:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.4.1</version>
<version>2.4.2</version>
</dependency>
```

Expand All @@ -224,22 +224,22 @@ and
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.11</artifactId>
<version>2.4.1</version>
<version>2.4.2</version>
</dependency>
```

### SBT

```bash
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.4.1"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.4.2"
```

and

```bash
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-ocr
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "2.4.1"
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "2.4.2"
```

Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
Expand Down Expand Up @@ -267,7 +267,7 @@ Note: You can import these notebooks by using their URLs.
4- From the Source drop-down menu, select **Maven Coordinate:**
![Databricks](https://databricks.com/wp-content/uploads/2015/07/select-maven-1024x711.png)

5- Now, all available **Spark Packages** are at your fingertips! Just search for **JohnSnowLabs:spark-nlp:version** where **version** stands for the library version such as: `2.4.1`
5- Now, all available **Spark Packages** are at your fingertips! Just search for **JohnSnowLabs:spark-nlp:version** where **version** stands for the library version such as: `2.4.2`
![Databricks](https://databricks.com/wp-content/uploads/2015/07/browser-1024x548.png)

6- Select **spark-nlp** package and we are good to go!
Expand Down
Loading

0 comments on commit cdfd954

Please sign in to comment.