-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARKNLP-962: UAEEmbeddings #14199
SPARKNLP-962: UAEEmbeddings #14199
Conversation
- added Scala side
- added Python Side
- Added default values - Serialization tests
- onnxModelPath is not set for models without an .onnx_data file, so it will be None - None.get will throw an error, this checks for it first
- Documentation
- make tests lazy
Hi @DevinTDHa Regarding the fix in onnx serialization, is it related to this issue: #14194 (https://colab.research.google.com/drive/119u6hXoT1PRB9F38InuEV-bm4g1uu9UH?usp=sharing) |
Hi @maziyarpanahi, Yes, the fix should prevent the error in the notebook as well. |
bf6d21e
into
JohnSnowLabs:release/533-release-candidate
Description
This PR adds an Annotator for UAE embeddings. For this, new pooling operations for word embeddings have been added.
Namely poooling by
[CLS]
token, or the last token)[CLS]
+ Mean of the embeddingsThese can be set with
setPoolingStrategy
for the annotator.Additionally, it fixes a bug with serializing onnx models that do not have a
.onnx_data
file (b73dc0b). @prabod I think you worked on this part, could you review if the fix looks good? I provided a description in the commit message. Thanks!How Has This Been Tested?
New tests and old tests are passing.
Screenshots (if appropriate):
Types of changes
Checklist: