Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Upload model after finishing training #826

Merged
merged 29 commits into from
Feb 2, 2024
Merged

Conversation

franz101
Copy link
Contributor

@franz101 franz101 commented Jan 23, 2024

@franz101 franz101 requested review from dcaustin33 and a team as code owners January 23, 2024 00:09
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2024

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (f178277) 86.40% compared to head (3105437) 86.40%.

Files Patch % Lines
dataquality/utils/upload_model.py 57.14% 12 Missing ⚠️
dataquality/core/finish.py 78.57% 3 Missing ⚠️
dataquality/clients/api.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #826   +/-   ##
=======================================
  Coverage   86.40%   86.40%           
=======================================
  Files         194      196    +2     
  Lines       15752    15848   +96     
=======================================
+ Hits        13610    13693   +83     
- Misses       2142     2155   +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dataquality/clients/api.py Outdated Show resolved Hide resolved
return response.status_code, response.text


def upload_model_to_dq() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this fn should take in a model

also you are not upoading the model to dq you are uploading it to object store

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, model is Any for now since huggingface can have many different model classes

Copy link
Contributor

@elboy3 elboy3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking pretty good, but left a few suggestions, also gotta format / lint. Main concern is that I'm not sure all auto flows should upload model by default, would love to discuss that detail

@franz101
Copy link
Contributor Author

I agree with auto. We can just make sure the dq.auto on sagemaker always uploads so you can later use it for inference

@franz101
Copy link
Contributor Author

franz101 commented Feb 1, 2024

@elboy3 Sir, please reconsider (I have added all your requests)

@franz101 franz101 force-pushed the features/model_upload branch 2 times, most recently from 6a9b93b to 8446fcd Compare February 1, 2024 21:09
def get_uploaded_model_info(self, project_id: UUID4, run_id: UUID4) -> Any:
"""
Returns information about the model for a given run.
Will also update the status to complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it update the status to complete?

also what does it return, the model or a presigned url to download it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't know on the backend when a minio upload is completed we update it every time we get the model and if the filename is not saved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what status is it updating, the job status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the upload is completed. at first it's the put link, once it's put we will pull the download link from minio and add it to the entry

dataquality/core/finish.py Outdated Show resolved Hide resolved
pyproject.toml Outdated
@@ -13,7 +13,7 @@ readme = "README.md"
license = {text = 'See LICENSE'}
requires-python = ">=3.8"
dependencies = [
"pydantic>=2.0.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you need any of these.... i fixed the tests and all the formatting issues earlier today just merge the latest from main and keep the pyproject.toml the same

@@ -647,6 +647,7 @@ def test_create_data_embs_df_custom_column(

# Check that no exception is thrown and that data embs are created
assert "text" not in df.get_column_names()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this whitespace

@franz101
Copy link
Contributor Author

franz101 commented Feb 2, 2024

Added test and reverted pyproject, ready to merge

@franz101 franz101 enabled auto-merge (squash) February 2, 2024 21:33
@franz101 franz101 merged commit 923d73e into main Feb 2, 2024
4 of 5 checks passed
@franz101 franz101 deleted the features/model_upload branch February 2, 2024 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants