feat: Upload model after finishing training #826

franz101 · 2024-01-23T00:09:03Z

This allows inference and auto training

Part of:
https://app.shortcut.com/galileo/story/10988/jpmc-allow-simple-upload-of-csv-to-train-a-model-and-get-dataquality-insights

codecov-commenter · 2024-01-23T06:10:55Z

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (f178277) 86.40% compared to head (3105437) 86.40%.

Files	Patch %	Lines
dataquality/utils/upload_model.py	57.14%	12 Missing ⚠️
dataquality/core/finish.py	78.57%	3 Missing ⚠️
dataquality/clients/api.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #826   +/-   ##
=======================================
  Coverage   86.40%   86.40%           
=======================================
  Files         194      196    +2     
  Lines       15752    15848   +96     
=======================================
+ Hits        13610    13693   +83     
- Misses       2142     2155   +13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dataquality/utils/auto_trainer.py

dataquality/integrations/transformers_trainer.py

dataquality/clients/api.py

elboy3 · 2024-01-29T05:08:41Z

dataquality/utils/upload_model.py

+        return response.status_code, response.text
+
+
+def upload_model_to_dq() -> None:


i think this fn should take in a model

also you are not upoading the model to dq you are uploading it to object store

done, model is Any for now since huggingface can have many different model classes

dataquality/clients/api.py

elboy3

looking pretty good, but left a few suggestions, also gotta format / lint. Main concern is that I'm not sure all auto flows should upload model by default, would love to discuss that detail

franz101 · 2024-01-30T02:53:24Z

I agree with auto. We can just make sure the dq.auto on sagemaker always uploads so you can later use it for inference

franz101 · 2024-02-01T17:18:54Z

@elboy3 Sir, please reconsider (I have added all your requests)

elboy3 · 2024-02-01T22:04:04Z

dataquality/clients/api.py

+    def get_uploaded_model_info(self, project_id: UUID4, run_id: UUID4) -> Any:
+        """
+        Returns information about the model for a given run.
+        Will also update the status to complete.


why does it update the status to complete?

also what does it return, the model or a presigned url to download it?

since we don't know on the backend when a minio upload is completed we update it every time we get the model and if the filename is not saved.

so what status is it updating, the job status?

just the upload is completed. at first it's the put link, once it's put we will pull the download link from minio and add it to the entry

dataquality/core/finish.py

elboy3 · 2024-02-02T21:05:14Z

pyproject.toml

@@ -13,7 +13,7 @@ readme = "README.md"
 license = {text = 'See LICENSE'}
 requires-python = ">=3.8"
 dependencies = [
-    "pydantic>=2.0.0",


i don't think you need any of these.... i fixed the tests and all the formatting issues earlier today just merge the latest from main and keep the pyproject.toml the same

elboy3 · 2024-02-02T21:05:24Z

tests/loggers/test_seq2seq.py

@@ -647,6 +647,7 @@ def test_create_data_embs_df_custom_column(

    # Check that no exception is thrown and that data embs are created
    assert "text" not in df.get_column_names()
+


don't need this whitespace

franz101 · 2024-02-02T21:32:56Z

Added test and reverted pyproject, ready to merge

franz101 added 4 commits August 25, 2023 15:29

test dq upload

550f17e

fix endpoint

82009f9

added env

17dd4c2

Merge branch 'main' into features/model_upload

7ba6d8a

franz101 requested review from dcaustin33 and a team as code owners January 23, 2024 00:09

franz101 added 8 commits January 22, 2024 18:09

doc string

528289c

Update upload_model.py

7216578

fix model upload

e8a3b9a

typing

9da876d

typing

4cf3a2e

fix typing v2

0b7462c

fix endpoint

e1bc4d9

formatting

3ae5f36

Merge branch 'main' into features/model_upload

c7a5b23

elboy3 reviewed Jan 29, 2024

View reviewed changes

dataquality/utils/auto_trainer.py Outdated Show resolved Hide resolved

elboy3 reviewed Jan 29, 2024

View reviewed changes

dataquality/integrations/transformers_trainer.py Outdated Show resolved Hide resolved

elboy3 reviewed Jan 29, 2024

View reviewed changes

dataquality/clients/api.py Outdated Show resolved Hide resolved

elboy3 reviewed Jan 29, 2024

View reviewed changes

dataquality/clients/api.py Show resolved Hide resolved

elboy3 suggested changes Jan 29, 2024

View reviewed changes

pr review changes

5026974

franz101 added 3 commits February 1, 2024 11:20

cleanup

7f542a3

version bump

f57bdb9

docs

8446fcd

franz101 force-pushed the features/model_upload branch 2 times, most recently from 6a9b93b to 8446fcd Compare February 1, 2024 21:09

pin versions

ff0e0d3

elboy3 reviewed Feb 1, 2024

View reviewed changes

dataquality/core/finish.py Outdated Show resolved Hide resolved

elboy3 and others added 5 commits February 1, 2024 17:10

fix one test

fb39a55

fix linting

01fb601

fix tests

7e4b063

fix tests

e29bf6a

Merge branch 'main' into features/model_upload

31d46ee

elboy3 reviewed Feb 2, 2024

View reviewed changes

franz101 added 5 commits February 2, 2024 15:06

add test

ba6883e

linter

a7d7fee

revert pyproject

e3be4a4

remove whitespace

8a18f8a

fromatting

b392d71

franz101 enabled auto-merge (squash) February 2, 2024 21:33

improve coverage

3105437

elboy3 approved these changes Feb 2, 2024

View reviewed changes

franz101 merged commit 923d73e into main Feb 2, 2024
4 of 5 checks passed

franz101 deleted the features/model_upload branch February 2, 2024 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Upload model after finishing training #826

feat: Upload model after finishing training #826

franz101 commented Jan 23, 2024 •

edited

Loading

codecov-commenter commented Jan 23, 2024 •

edited by codecov bot

Loading

elboy3 Jan 29, 2024

franz101 Feb 1, 2024

elboy3 left a comment •

edited

Loading

franz101 commented Jan 30, 2024

franz101 commented Feb 1, 2024

elboy3 Feb 1, 2024

franz101 Feb 1, 2024

elboy3 Feb 2, 2024

franz101 Feb 2, 2024

elboy3 Feb 2, 2024

elboy3 Feb 2, 2024

franz101 commented Feb 2, 2024

		return response.status_code, response.text


		def upload_model_to_dq() -> None:

		@@ -647,6 +647,7 @@ def test_create_data_embs_df_custom_column(

		# Check that no exception is thrown and that data embs are created
		assert "text" not in df.get_column_names()

feat: Upload model after finishing training #826

feat: Upload model after finishing training #826

Conversation

franz101 commented Jan 23, 2024 • edited Loading

codecov-commenter commented Jan 23, 2024 • edited by codecov bot Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elboy3 left a comment • edited Loading

Choose a reason for hiding this comment

franz101 commented Jan 30, 2024

franz101 commented Feb 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

franz101 commented Feb 2, 2024

franz101 commented Jan 23, 2024 •

edited

Loading

codecov-commenter commented Jan 23, 2024 •

edited by codecov bot

Loading

elboy3 left a comment •

edited

Loading