Release/430 release candidate #13346

maziyarpanahi · 2023-01-14T07:47:31Z

- delimiter id is actually correct and doesn't need any offset

- annotator and ResourceDownloader

- anything that can be used by other DL engines - there are some features that are exclusively designed by using TensorFlow so they can stay in tensorflow package

- sometimes the import optimization doesn't follow scalafmt rules

…notator' into SPARKNLP-695-refactor-ml-module

- actually sentencepiece does use TensorFlow to load the SP model so it must stay in tensorflow package - io is also mostly used in loading TensorFlow models

- Spark 3.3.1 is now a default package for our APIs - GCP storage is updated to 2.16.0 from 2.15.0

- It needs to asset with `"The deserializer is not supported: need a(n) \"ARRAY\" field but got \"STRING\"."`

Removed the duplicated definition of method `setWeightedDistPath` from `ContextSpellCheckerApproach`.

Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-728 Verifies if model already exist in S3 before unzipping when cache_pretrained is defined as S3 bucket * SPARKNLP-728 Verifies if model already exist in GCP before unzipping when cache_pretrained is defined as GCP bucket

* [skip ci] SPARKNLP-709: Add Documentation - SwinForImageClassification, HubertForCTC - Added additional example for ViTForImageClassification * Fix some PyDocs and add ignore pattern to sphinx * Resolved some warnings for Sphinx * Improve WordSegmenter Docs * Resolve sphinx Warnings * Remove typo * SPARKNLP-709: Add documentation for ZeroShotNer

* SPARKNLP-734 Enable params argument in spark_nlp.start() * SPARKNLP-734 Revert cache_folder, log_folder and cluster_tmp_dir for backward compatibility

* Implement DateMatcher annotator * Add Date2Chunk annotator and unit test to Python - Move Token2Chunk to the base module - Add Date2Chunk to the Python APIs - Add Date2Chunk unit tests

* SPARKNLP-733: Fix loadSavedModel for private buckets - also resolves warnings * SPARKNLP-733: Check region for S3 loadSavedModel

* [skip ci] SPARK-NLP-721: New example notebooks * SPARK-NLP-721: Changed Notebook Links

* Doc Id column implementation * Including tests * Access tuple and not cleared variable for the doc and the sentence

* Update code style [skip test] * Refactor m1 to silicon - README/doc - dependencies (spark-nlp-silicon) - func (apple_silicon=True/true)

- no matter what, Intellij cannot ignore formatting *.ipynb notebooks! Exclude formatter even ignore file types are no help

- the www.johnsnowlabs.com/slack-redirect DNS redirect is forcing people to have @johnsnowlabs.com email in order to register - using the raw invitation URL fixes this issue

maziyarpanahi and others added 30 commits December 25, 2022 10:11

Fix calculating delimiter id in CamemBERT

f691c17

- delimiter id is actually correct and doesn't need any offset

SPARKNLP-474 Create CamemBertForQuestionAnswering annotator

2a73119

SPARKNLP-474 Add CamemBertForQuestionAnswering where it's needed

e1b2d8a

- annotator and ResourceDownloader

SPARKNLP-474 make default lang French

f2a6420

SPARKNLP-474 Add CamemBertForQuestionAnswering to Python

2e72c55

SPARKNLP-474 Add Python unit test for CamemBertForQuestionAnswering

499d77e

SPARKNLP-474 Add Scala unit test to CamemBertForQuestionAnswering

df6c36e

SPARKNLP-696 Rename all read model traits to a generic name

e354dfd

SPARKNLP-696 Rename TF backends to more generic DL names

ee7f674

SPARKNLP-696 Rename TF backends to more generic DL names

858fc09

SPARKNLP-696 Rename TF backends to more generic DL names

bfcea90

- anything that can be used by other DL engines - there are some features that are exclusively designed by using TensorFlow so they can stay in tensorflow package

Reenforce scalafmt coding style

faf11d6

- sometimes the import optimization doesn't follow scalafmt rules

Merge branch 'SPARKNLP-474-Implement-CamembertForQuestionAnswering-an…

350a877

…notator' into SPARKNLP-695-refactor-ml-module

Update the new CamemBertForQuestionAnswering with ai package

e588ebb

SPARKNLP-697 Reduce code duplicates for pre and post data preparations

4786167

Reduce duplicate codes in Bert backend

89abddd

Move SP and Chunk bytes to util under ai package

11c1e34

Add private with top domain enclosure to ai backends

aad6d84

Refactor encoding in BERT and DeBERTa

ee39c01

Move io and sentencepiece packages back to tensorflow

494a53b

- actually sentencepiece does use TensorFlow to load the SP model so it must stay in tensorflow package - io is also mostly used in loading TensorFlow models

Refactoring preparing inputs for embeddings

b1209d7

SPARKNLP-697 refactor more duplicate codes in transformer embeddings

365cb45

Use Spark 3.3.1 as default and update gcp to 2.16.0

fab3c2e

- Spark 3.3.1 is now a default package for our APIs - GCP storage is updated to 2.16.0 from 2.15.0

Spark 3.3.1 uses log4j2 so we need another file for log4j2

34445c4

Update scala test to 3.2.14

d7c422e

Test Python by using pyspark 3.3.1 in GA

318d4f3

Ignore test directory by adding ignored tmp_

1d1c9cd

Update build.sbt

df2bc5c

Fix AnalysisException exception that requires a different caught message

2b6995b

- It needs to asset with `"The deserializer is not supported: need a(n) \"ARRAY\" field but got \"STRING\"."`

Removed duplicated method definition (#13280)

678609c

Removed the duplicated definition of method `setWeightedDistPath` from `ContextSpellCheckerApproach`.

C-K-Loan and others added 7 commits January 28, 2023 16:34

remove debug print (#13420)

6f54e19

Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

Update code styling [skip test]

e4b7cbf

Add log4j2.properties to test units [skip test]

540b50e

Bump version to 4.3.0 [skip test]

41decd2

SPARKNLP-734 Enable params argument in spark_nlp.start() (#13441)

3fdfaf5

* SPARKNLP-734 Enable params argument in spark_nlp.start() * SPARKNLP-734 Revert cache_folder, log_folder and cluster_tmp_dir for backward compatibility

maziyarpanahi mentioned this pull request Feb 1, 2023

SPARKNLP-734 Updating notebooks for params argument JohnSnowLabs/spark-nlp-workshop#880

Closed

maziyarpanahi and others added 20 commits February 1, 2023 14:43

Sparknlp 736 Implement Date2Chunk annotator (#13447)

22f1ddc

* Implement DateMatcher annotator * Add Date2Chunk annotator and unit test to Python - Move Token2Chunk to the base module - Add Date2Chunk to the Python APIs - Add Date2Chunk unit tests

SPARKNLP-733: Fix loadSavedModel for private buckets (#13432)

dbb1427

* SPARKNLP-733: Fix loadSavedModel for private buckets - also resolves warnings * SPARKNLP-733: Check region for S3 loadSavedModel

SPARKNLP-734 adding prediction example notebooks

4c0bfeb

SPARKNLP-712: Update links to example notebooks (#13459)

3f553d0

* [skip ci] SPARK-NLP-721: New example notebooks * SPARK-NLP-721: Changed Notebook Links

Rename folder example to examples [skip test]

933188d

Add missing linguist-vendored [skip test]

d4a8bcb

Doc id conll reader (#13410)

0f0f8f6

* Doc Id column implementation * Including tests * Access tuple and not cleared variable for the doc and the sentence

Update ZeroShotNerModelTest [skip test]

a2fa6eb

SPARKNLP-737: ZeroShotNer Notebook (#13474)

168b6c4

Sparknlp 740 rename refactor m 1 to silicon (#13476)

8d9a5f7

* Update code style [skip test] * Refactor m1 to silicon - README/doc - dependencies (spark-nlp-silicon) - func (apple_silicon=True/true)

Update annoying formatter of ipynb notebooks [skip test]

56af382

- no matter what, Intellij cannot ignore formatting *.ipynb notebooks! Exclude formatter even ignore file types are no help

Fix the default model for SwinForImageClassification [skip test]

0253a7c

Replace the Slack redirect with actual invitation link [skip test]

2dd76bc

- the www.johnsnowlabs.com/slack-redirect DNS redirect is forcing people to have @johnsnowlabs.com email in order to register - using the raw invitation URL fixes this issue

Fix the default pretrained model [skip test]

0e33b7b

Update links in Examples page [skip test]

7752b4a

Update docs with new models count [skip test]

9449b12

Update CHANGELOG [run doc]

00f31d1

Update Scala and Python APIs

fbdf4be

Update docs and links to examples [skip test]

10458cb

Release 4.3.0 on Conda [skip test]

bb34c07

maziyarpanahi merged commit 281c0af into master Feb 9, 2023

KshitizGIT deleted the release/430-release-candidate branch March 2, 2023 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/430 release candidate #13346

Release/430 release candidate #13346

maziyarpanahi commented Jan 14, 2023 •

edited

Loading

Release/430 release candidate #13346

Release/430 release candidate #13346

Conversation

maziyarpanahi commented Jan 14, 2023 • edited Loading

maziyarpanahi commented Jan 14, 2023 •

edited

Loading