Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKNLP-784 Fix loading WordEmbeddingsModel Bug #13707

Conversation

danilojsl
Copy link
Contributor

@danilojsl danilojsl commented Mar 27, 2023

Description

This change fixes a bug when loading WordEmbeddingsModel in case cache_folder is defined for S3

Motivation and Context

Fix this kind of error:

[info] Uploading model glove_100d_en_2.4.0_2.4_1579690104032 to external Cloud Storage URI: s3a://aws-glue-assets/cache_pretrained
[info] Download done! Loading the resource.
[error] Exception in thread "main" java.lang.ExceptionInInitializerError
[error]         at com.johnsnowlabs.grpc.LanguageServer.run(LanguageServer.scala:44)
[error]         at com.johnsnowlabs.grpc.LanguageServer$.start(LanguageServer.scala:28)
[error]         at com.johnsnowlabs.grpc.Main$.init(Main.scala:20)
[error]         at com.johnsnowlabs.grpc.Main$.main(Main.scala:12)
[error]         at com.johnsnowlabs.grpc.Main.main(Main.scala)
[error] Caused by: java.lang.IllegalArgumentException: Wrong FS: s3a://aws-glue-assets/cache_pretrained/glove_100d_en_2.4.0_2.4_1579690104032/storage/EMBEDDINGS_glove_100d, expected: file:///

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

copilot:all

@danilojsl danilojsl requested a review from maziyarpanahi March 27, 2023 19:05
@danilojsl danilojsl added bug-fix DON'T MERGE Do not merge this PR labels Mar 27, 2023
@maziyarpanahi maziyarpanahi changed the base branch from master to release/440-release-candidate April 6, 2023 17:57
@maziyarpanahi maziyarpanahi merged commit dbad9f2 into release/440-release-candidate Apr 6, 2023
maziyarpanahi added a commit that referenced this pull request Apr 10, 2023
* SPARKNLP-782 Removes deprecated parameter enablePatternRegex (#13664)

* SPARKNLP-748: Custom Entity Name for Date2Chunk (#13680)

- added parameter "entityName" to change metadata name

* SPARKNLP-784 Fix loading WordEmbeddingsModel bug when cache_folder is from S3 (#13707)

* SPARKNLP-605: ConvNextForImageClassification (#13713)

* SPARKNLP-605: ConvNextForImageClassification

- Added ConvNextForImageClassification with new tests
- Refactored image Preprocessor and added new config
- Implemented filters with resample property for
  ImageResizeUtils.resizeBufferedImage (with minor
  performance gain)
- Minor improvements for ViT and Swin

* SPARKNLP-605: Docs

* SPARKNLP-605: Lazy values for test

* SPARKNLP-785 Fix WordEmbeddingsModel bug whit LightPipeline (#13715)

* [skip test] SPARKNLP-783: Python 3.6 deprecated in Spark 3.2 (#13724)

* SPARKNLP-763 Implementing ZeroShot Text Classification for BERT and DistilBERT based on NLI (#13727)

* SPARKNLP-763 Fix a typo

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 add unfinished traits

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Create a new BertForZeroShotClassification annotator

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Create a new HasCandidateLabelsProperties

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Implement predict sequence with NLI, new tokenize from strings, and new tag ZeroShot

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Clean up the code

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Add BertForZeroShotClassification to annotator [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Add BertForZeroShotClassification to ResourceDownloader [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Implement BertForZeroShotClassification in Python [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Add unit tests for BertForZeroShotClassification

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* change default model to bert_base_cased_zero_shot_classifier_xnli

* SPARKNLP-763 Fix Scaladoc and Pydoc

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Fix Update unit test in Scala

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

---------

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Sparknlp 534 Introducing BART Transformer for text-to-text generation tasks like translation and summarization (#13731)

* WIP: Added Bart transformer scala files

* WIP: Added BART tokenizer test and BART is locally working

* WIP: Added BART tokenizer test and BART is locally working

* WIP: Added Beam Hypothesis and Beam Scorer implementations

* WIP: Added Logit Processors

* WIP: Added Beam Search implementation

* WIP: Completed Beam Search implementation
WIP: Added Generate method for text generation

* WIP: fixed a bug in Beam search algorithm
WIP: Generate method for text generation

* WIP: changed BartTransformer methods to include beam size and added description

* WIP: changed BartTransformer test methods

* WIP: fixed errors in BeamSearch

* WIP: Updated to use separate encoder decoder model

* WIP: Changed model to handle the int64 version of the model weights

* WIP: Added python API implementation

* Pass session and encoder state as a parameter
Clean up unnecessary code

* Update TopK Logit Warper Logic

* Code clean up

* Update Tests

* Update documentation

* Update documentation and python tests

* Update python tests

* SPARKNLP-534 move BartTokenizer to the Bart backend

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-534 Fix the copyright year

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-534 Add BartTransformer to annotator and ResourceDownloader

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-534 Fix BartTransformer in annotator

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

---------

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Bump version to 4.4.0

* Update doc style and fix unit test [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-605: Fix parameter eval for vit tests

* Update default model name (#13744)

* SPARKNLP-796 Creating a new `nerHasNoSchema` param (#13745)

* Adding missing CPUvsGPUbenchmark page

* SPARKNLP-796 Creating a new `nerHasNoSchema` param

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

---------

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Change default model for BART to distilbart-xsum-12-6

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Change default model for BART to distilbart_xsum_12_6

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Replace nlp with sparknlp.org website

* Update INT64 to INT32 (#13748)

* Fix the wrong column in unit test [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-805: Documentation for release/440 (#13743)

* Fixed memory leak

* Added Bart Notebook

* Add new features and update docs[run doc]

* Update install.md

* Update CHANGELOG [run doc]

* Update Scala and Python APIs

* release spark-nlp 4.4.0 on Conda [skip test]

---------

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Co-authored-by: Danilo Burbano <37355249+danilojsl@users.noreply.github.com>
Co-authored-by: Devin Ha <33089471+DevinTDHa@users.noreply.github.com>
Co-authored-by: Prabod Rathnayaka <prabod@rathnayaka.me>
Co-authored-by: Devin Ha <t.ha@tu-berlin.de>
Co-authored-by: github-actions <action@github.com>
jsl-builder pushed a commit that referenced this pull request Apr 12, 2023
* SPARKNLP-782 Removes deprecated parameter enablePatternRegex (#13664)

* SPARKNLP-748: Custom Entity Name for Date2Chunk (#13680)

- added parameter "entityName" to change metadata name

* SPARKNLP-784 Fix loading WordEmbeddingsModel bug when cache_folder is from S3 (#13707)

* SPARKNLP-605: ConvNextForImageClassification (#13713)

* SPARKNLP-605: ConvNextForImageClassification

- Added ConvNextForImageClassification with new tests
- Refactored image Preprocessor and added new config
- Implemented filters with resample property for
  ImageResizeUtils.resizeBufferedImage (with minor
  performance gain)
- Minor improvements for ViT and Swin

* SPARKNLP-605: Docs

* SPARKNLP-605: Lazy values for test

* SPARKNLP-785 Fix WordEmbeddingsModel bug whit LightPipeline (#13715)

* [skip test] SPARKNLP-783: Python 3.6 deprecated in Spark 3.2 (#13724)

* SPARKNLP-763 Implementing ZeroShot Text Classification for BERT and DistilBERT based on NLI (#13727)

* SPARKNLP-763 Fix a typo

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 add unfinished traits

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Create a new BertForZeroShotClassification annotator

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Create a new HasCandidateLabelsProperties

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Implement predict sequence with NLI, new tokenize from strings, and new tag ZeroShot

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Clean up the code

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Add BertForZeroShotClassification to annotator [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Add BertForZeroShotClassification to ResourceDownloader [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Implement BertForZeroShotClassification in Python [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Add unit tests for BertForZeroShotClassification

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* change default model to bert_base_cased_zero_shot_classifier_xnli

* SPARKNLP-763 Fix Scaladoc and Pydoc

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-763 Fix Update unit test in Scala

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

---------

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Sparknlp 534 Introducing BART Transformer for text-to-text generation tasks like translation and summarization (#13731)

* WIP: Added Bart transformer scala files

* WIP: Added BART tokenizer test and BART is locally working

* WIP: Added BART tokenizer test and BART is locally working

* WIP: Added Beam Hypothesis and Beam Scorer implementations

* WIP: Added Logit Processors

* WIP: Added Beam Search implementation

* WIP: Completed Beam Search implementation
WIP: Added Generate method for text generation

* WIP: fixed a bug in Beam search algorithm
WIP: Generate method for text generation

* WIP: changed BartTransformer methods to include beam size and added description

* WIP: changed BartTransformer test methods

* WIP: fixed errors in BeamSearch

* WIP: Updated to use separate encoder decoder model

* WIP: Changed model to handle the int64 version of the model weights

* WIP: Added python API implementation

* Pass session and encoder state as a parameter
Clean up unnecessary code

* Update TopK Logit Warper Logic

* Code clean up

* Update Tests

* Update documentation

* Update documentation and python tests

* Update python tests

* SPARKNLP-534 move BartTokenizer to the Bart backend

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-534 Fix the copyright year

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-534 Add BartTransformer to annotator and ResourceDownloader

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-534 Fix BartTransformer in annotator

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

---------

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Bump version to 4.4.0

* Update doc style and fix unit test [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-605: Fix parameter eval for vit tests

* Update default model name (#13744)

* SPARKNLP-796 Creating a new `nerHasNoSchema` param (#13745)

* Adding missing CPUvsGPUbenchmark page

* SPARKNLP-796 Creating a new `nerHasNoSchema` param

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

---------

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Change default model for BART to distilbart-xsum-12-6

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Change default model for BART to distilbart_xsum_12_6

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* Replace nlp with sparknlp.org website

* Update INT64 to INT32 (#13748)

* Fix the wrong column in unit test [skip test]

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

* SPARKNLP-805: Documentation for release/440 (#13743)

* Fixed memory leak

* Added Bart Notebook

* Add new features and update docs[run doc]

* Update install.md

* Update CHANGELOG [run doc]

* Update Scala and Python APIs

* release spark-nlp 4.4.0 on Conda [skip test]

---------

Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Co-authored-by: Danilo Burbano <37355249+danilojsl@users.noreply.github.com>
Co-authored-by: Devin Ha <33089471+DevinTDHa@users.noreply.github.com>
Co-authored-by: Prabod Rathnayaka <prabod@rathnayaka.me>
Co-authored-by: Devin Ha <t.ha@tu-berlin.de>
Co-authored-by: github-actions <action@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix DON'T MERGE Do not merge this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants