-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor OpenAIEmbeddings #14334
Refactor OpenAIEmbeddings #14334
Conversation
* SPARKNLP-1036: Fix dev python kernel names * SPARKNLP-1036: Bump transformers version * SPARKNLP-1036: Fix Colab buttons * SPARKNLP-1036: Pin onnx version for compatibility * SPARKNLP-1036: Upgrade Spark version * SPARKNLP-1036: Minor Fixes * SPARKNLP-1036: Clean Metadata * SPARKNLP-1036: Add/Adjust Documentation - Note for supported Spark Version of Annotators - added missing Documentation for BGEEmbeddings
I created my branch from the master, So, There are some other commits, too. |
from pyspark.sql import DataFrame | ||
from pyspark.sql import SparkSession | ||
|
||
class OpenAIEmbeddingsTestCase(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add @pytest.mark.slow
annotation on top of class definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danilojsl,
I have added the @pytest.mark.slow
annotation with my new commit and updated the documentation.
e4a7c1a
Refactor OpenAIEmbeddings annotator
Description
Refactor OpenAIEmbeddings
1- Supported escape chars that break the Open AI json content.
2- Changed the output annotator type. DOCUMENT --> SENTENCE_EMBEDDINGS
NOTE: This approach is the reverse of the backward compatibility
3- Added metadata which comes from the document column to output embeddings
4- Added Python unit test class
5- Added a new submodule to support saving/loading the annotator
NOTE: The new submodule will fix saving/loading the annotator
Motivation and Context
How Has This Been Tested?
Tested via Python and Scala locally.
Additionally, I added new unit tests that cover my changes.
Screenshots (if appropriate):
Types of changes
Checklist: