Skip to content

Commit

Permalink
Merge pull request #13873 from JohnSnowLabs/release/500-release-candi…
Browse files Browse the repository at this point in the history
…date

Release/500 release candidate
  • Loading branch information
maziyarpanahi authored Jul 3, 2023
2 parents 179e4df + ae3e032 commit cf9b75e
Show file tree
Hide file tree
Showing 1,466 changed files with 53,853 additions and 5,144 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -338,3 +338,4 @@ python/docs/reference/_autosummary/**

# MS Visio Code
**/.vscode/
.metals/
18 changes: 18 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
========
5.0.0
========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing support for ONNX Runtime in Spark NLP. ONNX Runtime is a high-performance inference engine for machine learning models in the ONNX format. ONNX Runtime has proved to considerably increase the performance of inference for many models.
* **NEW:** Introducing **InstructorEmbeddings** annotator in Spark NLP 🚀. `InstructorEmbeddings` can load new state-of-the-art INSTRUCTOR Models inherited from T5 for Text Embeddings.
* **NEW:** Introducing **E5Embeddings** annotator in Spark NLP 🚀. `E5Embeddings` can load new state-of-the-art E5 Models inherited from BERT for Text Embeddings.
* **NEW:** Introducing **DocumentSimilarityRanker** annotator in Spark NLP 🚀. `DocumentSimilarityRanker` is a new annotator that uses LSH techniques present in Spark ML lib to execute approximate nearest neighbours search on top of sentence embeddings, It aims to capture the semantic meaning of a document in a dense, continuous vector space and return it to the ranker search.

----------------
Bug Fixes
----------------
* Fix BART issue with maxInputLength



========
4.4.4
========
Expand Down
158 changes: 81 additions & 77 deletions README.md

Large diffs are not rendered by default.

13 changes: 12 additions & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)

organization := "com.johnsnowlabs.nlp"

version := "4.4.4"
version := "5.0.0"

(ThisBuild / scalaVersion) := scalaVer

Expand Down Expand Up @@ -165,6 +165,16 @@ val tensorflowDependencies: Seq[sbt.ModuleID] =
else
Seq(tensorflowCPU)

val onnxDependencies: Seq[sbt.ModuleID] =
if (is_gpu.equals("true"))
Seq(onnxGPU)
else if (is_silicon.equals("true"))
Seq(onnxCPU)
else if (is_aarch64.equals("true"))
Seq(onnxCPU)
else
Seq(onnxCPU)

lazy val mavenProps = settingKey[Unit]("workaround for Maven properties")

lazy val root = (project in file("."))
Expand All @@ -175,6 +185,7 @@ lazy val root = (project in file("."))
testDependencies ++
utilDependencies ++
tensorflowDependencies ++
onnxDependencies ++
typedDependencyParserDependencies,
// TODO potentially improve this?
mavenProps := {
Expand Down
4 changes: 2 additions & 2 deletions conda/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ conda config --set anaconda_upload no
Build `spark-nlp` from the latest PyPI tar:

```bash
conda build . --python=3.7 && conda build . --python=3.8 && conda build . --python=3.9
conda build conda/
```

Example of uploading Conda package to Anaconda Cloud:

```bash
anaconda upload /anaconda3/conda-bld/noarch/spark-nlp-version-py37_0.tar.bz2
anaconda upload /anaconda3/conda-bld/noarch/spark-nlp-version-py_0.tar.bz2
```

## Install
Expand Down
4 changes: 0 additions & 4 deletions conda/conda_build_config.yaml

This file was deleted.

46 changes: 26 additions & 20 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,30 +1,36 @@
package:
name: "spark-nlp"
version: 4.4.4
{% set name = "spark-nlp" %}
{% set version = "4.4.0" %}

app:
entry: spark-nlp
summary: Natural Language Understanding Library for Apache Spark.
package:
name: {{ name|lower }}
version: {{ version }}

source:
fn: spark-nlp-4.4.4.tar.gz
url: https://files.pythonhosted.org/packages/f9/e4/5eb83ed1c68be9fca636f6c62f9e55da3f2e511818e2a8feb852d6986064/spark-nlp-4.4.4.tar.gz
sha256: d9e2f017ab7cf6e82e775c38862f1a4ee32bbb0af6619e0b9051e6737711b5b6
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/spark-nlp-{{ version }}.tar.gz
sha256: e76fdd82b966ca169ba8a1fdcfe2e684fc63abaf88de841d2eb881cacb5e0105

build:
noarch: generic
noarch: python
script: {{ PYTHON }} -m pip install . -vv
number: 0
script: "python -m pip install . --no-deps -vv"

requirements:
build:
- python
host:
- python >=3.7,<3.11
- pip
run:
- python
- python >=3.7,<3.11

test:
imports:
- sparknlp
commands:
- pip check
requires:
- pip

about:
home: https://github.com/JohnSnowLabs/spark-nlp/
license: Apache License 2.0
license_family: APACHE
license_url: https://github.com/JohnSnowLabs/spark-nlp/blob/master/LICENSE
description: John Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.
summary: Natural Language Understanding Library for Apache Spark.
home: https://github.com/JohnSnowLabs/spark-nlp
summary: John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.
license: Apache-2.0
license_file: LICENSE
6 changes: 4 additions & 2 deletions docs/_layouts/landing.html
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ <h2 class="h2_title grey">Transformers at Scale</h2>
<div class="transformer-descr">
Unlock the power of <b>Large Language Models</b> with <b>Spark NLP 🚀</b>, the only open-source library that delivers cutting-edge transformers for <b>production</b> such as
<b>BERT</b>, <b>CamemBERT</b>, <b>ALBERT</b>, <b>ELECTRA</b>, <b>XLNet</b>, <b>DistilBERT</b>, <b>RoBERTa</b>, <b>DeBERTa</b>,
<b>XLM-RoBERTa</b>, <b>Longformer</b>, <b>ELMO</b>, <b>Universal Sentence Encoder</b>, <b>Facebook BART</b>, <b>Google T5</b>, <b>MarianMT</b>, <b>OpenAI GPT2</b>,
<b>XLM-RoBERTa</b>, <b>Longformer</b>, <b>ELMO</b>, <b>Universal Sentence Encoder</b>, <b>Facebook BART</b>, <b>Instructor Embeddings</b>, <b>E5 Embeddings</b>, <b>Google T5</b>, <b>MarianMT</b>, <b>OpenAI GPT2</b>,
<b>Google ViT</b>, <b>ASR Wav2Vec2</b> and many more not only to <b>Python</b>, and <b>R</b> but also to <b>JVM</b> ecosystem (<b>Java</b>, <b>Scala</b>, and <b>Kotlin</b>) at <b>scale</b> by extending <b>Apache Spark</b> natively
</div>
</div>
Expand Down Expand Up @@ -304,6 +304,8 @@ <h4 class="blue h4_title">NLP Features</h4>
<li><strong>Universal Sentence</strong> Encoder</li>
<li><strong>Sentence</strong> Embeddings</li>
<li><strong>Chunk</strong> Embeddings</li>
<li><strong>Instructor</strong> Embeddings</li>
<li><strong>E5</strong> Embeddings</li>
</ul>
<ul class="list1">
<li>Table Question Answering <strong>(TAPAS)</strong></li>
Expand Down Expand Up @@ -332,7 +334,7 @@ <h4 class="blue h4_title">NLP Features</h4>
<li>Microsoft Swin Transformer <strong>Image Classification</strong></li>
<li>Facebook ConvNext <strong>Image Classification</strong></li>
<li>Automatic Speech Recognition <strong>(Wav2Vec2 & HuBERT)</strong></li>
<li>Easy <strong>TensorFlow</strong> integration</li>
<li>Easy <strong>ONNX</strong> and <strong>TensorFlow</strong> integrations</li>
<li><strong>GPU</strong> Support</li>
<li>Full integration with <strong>Spark ML</strong> functions</li>
<li><strong>12000+</strong> pre-trained <strong>models </strong> in <strong>200+ languages! </strong>
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com" />
<title>Spark NLP 5.0.0 ScalaDoc - com</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/CredentialParams.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.CredentialParams</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.CredentialParams" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com.johnsnowlabs.client.CredentialParams" />
<title>Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.CredentialParams</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.CredentialParams" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com.johnsnowlabs.client.CredentialParams" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSAnonymousCredentials</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSAnonymousCredentials" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com.johnsnowlabs.client.aws.AWSAnonymousCredentials" />
<title>Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSAnonymousCredentials</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSAnonymousCredentials" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com.johnsnowlabs.client.aws.AWSAnonymousCredentials" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/aws/AWSBasicCredentials.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSBasicCredentials</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSBasicCredentials" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com.johnsnowlabs.client.aws.AWSBasicCredentials" />
<title>Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSBasicCredentials</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSBasicCredentials" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com.johnsnowlabs.client.aws.AWSBasicCredentials" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSCredentialsProvider</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSCredentialsProvider" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com.johnsnowlabs.client.aws.AWSCredentialsProvider" />
<title>Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSCredentialsProvider</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSCredentialsProvider" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com.johnsnowlabs.client.aws.AWSCredentialsProvider" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/aws/AWSGateway.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSGateway</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSGateway" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com.johnsnowlabs.client.aws.AWSGateway" />
<title>Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSGateway</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSGateway" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com.johnsnowlabs.client.aws.AWSGateway" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSProfileCredentials</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSProfileCredentials" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com.johnsnowlabs.client.aws.AWSProfileCredentials" />
<title>Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSProfileCredentials</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSProfileCredentials" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com.johnsnowlabs.client.aws.AWSProfileCredentials" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/aws/AWSTokenCredentials.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSTokenCredentials</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.AWSTokenCredentials" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com.johnsnowlabs.client.aws.AWSTokenCredentials" />
<title>Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSTokenCredentials</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.AWSTokenCredentials" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com.johnsnowlabs.client.aws.AWSTokenCredentials" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/aws/Credentials.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.Credentials</title>
<meta name="description" content="Spark NLP 4.4.4 ScalaDoc - com.johnsnowlabs.client.aws.Credentials" />
<meta name="keywords" content="Spark NLP 4.4.4 ScalaDoc com.johnsnowlabs.client.aws.Credentials" />
<title>Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.Credentials</title>
<meta name="description" content="Spark NLP 5.0.0 ScalaDoc - com.johnsnowlabs.client.aws.Credentials" />
<meta name="keywords" content="Spark NLP 5.0.0 ScalaDoc com.johnsnowlabs.client.aws.Credentials" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 4.4.4 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 5.0.0 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
Loading

0 comments on commit cf9b75e

Please sign in to comment.