-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Upload model after finishing training #826
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #826 +/- ##
=======================================
Coverage 86.40% 86.40%
=======================================
Files 194 196 +2
Lines 15752 15848 +96
=======================================
+ Hits 13610 13693 +83
- Misses 2142 2155 +13 ☔ View full report in Codecov by Sentry. |
dataquality/utils/upload_model.py
Outdated
return response.status_code, response.text | ||
|
||
|
||
def upload_model_to_dq() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this fn should take in a model
also you are not upoading the model to dq you are uploading it to object store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, model is Any for now since huggingface can have many different model classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking pretty good, but left a few suggestions, also gotta format / lint. Main concern is that I'm not sure all auto flows should upload model by default, would love to discuss that detail
I agree with auto. We can just make sure the dq.auto on sagemaker always uploads so you can later use it for inference |
@elboy3 Sir, please reconsider (I have added all your requests) |
6a9b93b
to
8446fcd
Compare
def get_uploaded_model_info(self, project_id: UUID4, run_id: UUID4) -> Any: | ||
""" | ||
Returns information about the model for a given run. | ||
Will also update the status to complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it update the status to complete?
also what does it return, the model or a presigned url to download it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we don't know on the backend when a minio upload is completed we update it every time we get the model and if the filename is not saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what status is it updating, the job status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the upload is completed. at first it's the put link, once it's put we will pull the download link from minio and add it to the entry
pyproject.toml
Outdated
@@ -13,7 +13,7 @@ readme = "README.md" | |||
license = {text = 'See LICENSE'} | |||
requires-python = ">=3.8" | |||
dependencies = [ | |||
"pydantic>=2.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need any of these.... i fixed the tests and all the formatting issues earlier today just merge the latest from main and keep the pyproject.toml the same
tests/loggers/test_seq2seq.py
Outdated
@@ -647,6 +647,7 @@ def test_create_data_embs_df_custom_column( | |||
|
|||
# Check that no exception is thrown and that data embs are created | |||
assert "text" not in df.get_column_names() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need this whitespace
Added test and reverted pyproject, ready to merge |
This allows inference and auto training
Part of:
https://app.shortcut.com/galileo/story/10988/jpmc-allow-simple-upload-of-csv-to-train-a-model-and-get-dataquality-insights