WIP: #12 Create a MlflowModelDataSet #56

turn1a · 2020-08-27T07:58:21Z

Closes #12.

Opening PR to continue discussions about this implementation here.

In this implementation, I'm both saving locally and logging to MLflow.

TODO:

Agree on _save strategy: save+log vs log only
~~Saving to specified run_id~~ - I wasn't able to find a way to log a model to a particular run.
Documentation
Changelog update
Implement pyfunc support

Galileo-Galilei · 2020-09-09T21:22:56Z

For the record, I really think that MlflowModelDataSet should not perform logging in mlflow (and associated run_id management you seem in trouble with) and delegate this to MlflowDataSet. It will make this DataSet minimal and still offer all desirable functionalities in an eay to use user interface. (e.g calling (MlflowDataSet(data_set={type:MlflowModelDataSet, filepath:/path/to/local/folder}) seems fine).

turn1a · 2020-09-10T13:10:21Z

Apart from just logging the model as an artifact in a run, log_model also creates a model version and registers it when registered_model_name is specified in log_model. If we go for save_model and log it as an artifact using MlflowArtifactDataSet, we will lose this functionality and quite possibly others that might be added in the future.

registered_model_name – (Experimental) If given, create a model version under registered_model_name, also creating a registered model if one with the given name does not exist.

Galileo-Galilei · 2020-09-13T21:54:41Z

Yes, you're totally right. I still think that saving/loading locally and remotely are two different operations that should be performed in different datasets. It would be both easier to manage (for us) and to use (for end users).

turn1a · 2020-09-23T16:04:25Z

Let's discuss this when on the call.

Galileo-Galilei · 2020-10-21T19:10:29Z

Hello @kaemo, when do you plan to finish on this PR? It looks like it is 95% finished, and it is a frequently required feature. What prevent you from merging it right now? The sooner we deploy it, the more we can get feedback to improve it. Feel free to ask if you need help.

codecov-io · 2020-11-01T21:48:45Z

Codecov Report

Merging #56 into develop will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##           develop       #56   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           20        24    +4     
  Lines          634       713   +79     
=========================================
+ Hits           634       713   +79

Impacted Files	Coverage Δ
kedro_mlflow/io/models/__init__.py	`100.00% <100.00%> (ø)`
..._mlflow/io/models/mlflow_abstract_model_dataset.py	`100.00% <100.00%> (ø)`
...ro_mlflow/io/models/mlflow_model_logger_dataset.py	`100.00% <100.00%> (ø)`
...dro_mlflow/io/models/mlflow_model_saver_dataset.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9adb5ef...a166982. Read the comment docs.

Galileo-Galilei · 2020-11-01T21:55:52Z

Hello @kaemo, a few updates:

I have rebased to be up to date with the develop branch and I fixed the conflicts. It will likely mess up your local version so update it accordingly.
I have splitted your dataset into two datasets
- one for logging/ loading from a run_id called MlflowModelLoggerDataSet
- one for saving locally / reading from file , called MlfflowModelSaverDataSet
I have put them in a models subfolder and open an issue to refactor the io folder : Refactor kedro_mlflow.io folder #109
Update tests and documentation accordingly
Fix a few typos:
- do not use default dict argumetns which are mutable
- change requiremetns from sklearn to scikit-learn
- Fix a typo in a if condition (mflow.pyfunc instead of mlflow.pyfunc)

It would be great if we could discuss it this week. Feel free to make any comment if you are not comfortable with these changes

…d logger

Galileo-Galilei · 2020-11-03T19:38:16Z

Hello @kaemo, I merge it for the next release. Feel free to reopen if you want to discuss / refactor. Thank you very much for the amazing work!

turn1a self-assigned this Aug 27, 2020

turn1a marked this pull request as draft September 3, 2020 15:31

turn1a mentioned this pull request Sep 8, 2020

Create a MlflowModelDataSet #12

Closed

Galileo-Galilei mentioned this pull request Oct 1, 2020

Migrate to 0.16.5 #62

Closed

3 tasks

Galileo-Galilei force-pushed the 12-create-a-mlflowmodeldataset branch from fb68c79 to c2b1d96 Compare November 1, 2020 21:47

Galileo-Galilei requested review from takikadiri and akruszewski November 1, 2020 21:55

kaemo and others added 2 commits November 3, 2020 20:28

FIX #12 - Add MLflowModelDataSet with versioning support

60d4b1e

FIX #12 - Refactor MlflowModelDataSet to distinguish between saver an…

a166982

…d logger

Galileo-Galilei force-pushed the 12-create-a-mlflowmodeldataset branch from c2b1d96 to a166982 Compare November 3, 2020 19:33

Galileo-Galilei marked this pull request as ready for review November 3, 2020 19:34

Galileo-Galilei merged commit 1dda1e4 into develop Nov 3, 2020

Galileo-Galilei deleted the 12-create-a-mlflowmodeldataset branch November 13, 2020 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: #12 Create a MlflowModelDataSet #56

WIP: #12 Create a MlflowModelDataSet #56

turn1a commented Aug 27, 2020 •

edited

Loading

Galileo-Galilei commented Sep 9, 2020

turn1a commented Sep 10, 2020 •

edited

Loading

Galileo-Galilei commented Sep 13, 2020

turn1a commented Sep 23, 2020

Galileo-Galilei commented Oct 21, 2020

codecov-io commented Nov 1, 2020 •

edited

Loading

Galileo-Galilei commented Nov 1, 2020

Galileo-Galilei commented Nov 3, 2020

WIP: #12 Create a MlflowModelDataSet #56

WIP: #12 Create a MlflowModelDataSet #56

Conversation

turn1a commented Aug 27, 2020 • edited Loading

Galileo-Galilei commented Sep 9, 2020

turn1a commented Sep 10, 2020 • edited Loading

Galileo-Galilei commented Sep 13, 2020

turn1a commented Sep 23, 2020

Galileo-Galilei commented Oct 21, 2020

codecov-io commented Nov 1, 2020 • edited Loading

Codecov Report

Galileo-Galilei commented Nov 1, 2020

Galileo-Galilei commented Nov 3, 2020

turn1a commented Aug 27, 2020 •

edited

Loading

turn1a commented Sep 10, 2020 •

edited

Loading

codecov-io commented Nov 1, 2020 •

edited

Loading