Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert protobuf model to native format #550

Merged
merged 11 commits into from
Nov 18, 2022

Conversation

psfoley
Copy link
Contributor

@psfoley psfoley commented Oct 27, 2022

This PR:

  • Adds a new command, fx model save, that takes as input the .pbuf model file produced by a federated experiment and converts it to the native PyTorch / TensorFlow model representation for future use.

This PR is WIP. The core functionality to be refactored by @igor-davidyuk into the utilities folder, so this can also be used via API.

closes #552

@psfoley psfoley requested a review from igor-davidyuk October 27, 2022 21:59
@igor-davidyuk igor-davidyuk self-assigned this Oct 31, 2022
@igor-davidyuk igor-davidyuk marked this pull request as draft October 31, 2022 07:54
@igor-davidyuk igor-davidyuk removed their request for review October 31, 2022 07:54
@igor-davidyuk
Copy link
Contributor

igor-davidyuk commented Oct 31, 2022

We want to introduce an additional piece of functionality in two forms:

  • CLI: fx model save command called from the workspace must save the model on disk.
  • Python API: openfl.model_save(path) must return the model object.

We start with model.pbuf path to the model snapshot that we need to decompress and load weights into the model object to save or return it

  1. To save the model in a native format we need save_native method implemented in TaskRunner
    This method is not implemented in all our template TaskRunners but this is fine as we may ask users to implement it.
  2. To return the model as a Python object we need to access this object
    Models are stored in different ways in our template TaskRunners. Moreover, there is a particular problem with TF1 examples. We may need to introduce an additional get_model method to all of our template TaskRunners.
  3. To get access to the model we need to initialize TaskRunner
    Model initialization happens in TaskRunner code and often requires feature_shape information to build a model.
  4. To initialize TaskRunner we need the plan.yaml path and DataLoader
    Again, model initialization is coupled with DataLoader initialization.
  5. To initialize DataLoader we need data.yaml and cols.yaml
    Here we use the first collaborator's name to get its data_path and load data to get the feature_shape. It works for our templates, which use synthetic or downloaded datasets, but in real-world usecases, users often may not have access to any data from their machine. A way to overcome this obstacle may be to define feature_shape in plan.yaml or hardcode it inside DataLoader.

With all said above, at this point, we can not guarantee model save API functioning to all users in the intended way.

@igor-davidyuk
Copy link
Contributor

igor-davidyuk commented Nov 3, 2022

As a solution to the abovementioned problems I propose the following approach:

  1. For the CLI command we will rely on TaskRunner's save_native method that users can implement or change if needed.
  2. For the Python API part, we will return a TaskRunner object, thus allowing users to interact with the linked model.

These signatures allow us to mitigate issues caused by the diversity of model definition forms in our TuskRunners and yet deliver the expected functionality. At the same time, the data problem mentioned under point 5 is ignored, as we know it is already solved prior to executing fx plan initialize which follows a similar path.

Work left in this PR:

  • Update the docs
  • Write tests

@igor-davidyuk igor-davidyuk marked this pull request as ready for review November 3, 2022 13:56
@igor-davidyuk igor-davidyuk changed the title WIP: Convert protobuf model to native format Convert protobuf model to native format Nov 3, 2022
@option('-i', '--input', 'model_protobuf_path', required=True,
help='The model protobuf to convert',
type=ClickPath(exists=True))
@option('-o', '--output', 'output_filepath', required=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could require this option as well for users to make sure they provide the path for the saved model. In case of someone forgot to specify it, it would be easier to notify them about the wrong command call and ask for the path. Otherwise, they have to search the save path in the console output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intention, but I would argue that we should make arguments required only if they are literally required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_filepath is required by the _save function for calling TaskRunner.save_native.

For example, torch.save and tf.keras.Model.save_weights do require a file argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but those functions are python function calls, in our case we have a CLI command which is called from a specific place, namely, an experiment workspace.
Honestly, I am not against making this argument required. @mansishr, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that here is used good approach with default value. It is checked that file already exists and is asked user for the confirmation to rewrite. At the end, it is written to the output in witch path the model is saved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, specifying a default path helps for our CLI command.

@igor-davidyuk igor-davidyuk self-requested a review November 9, 2022 13:27
Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>
Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>
Co-authored-by: Ilya Trushkin <ilya.trushkin@intel.com>
Copy link
Collaborator

@mansishr mansishr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@psfoley psfoley merged commit 3f2337d into securefederatedai:develop Nov 18, 2022
@itrushkin itrushkin mentioned this pull request Nov 23, 2022
aleksandr-mokrov pushed a commit to aleksandr-mokrov/openfl that referenced this pull request Nov 28, 2022
* Initial implementation of CLI command to save model in native format (to be refactored)

* method functioning

* updated function and docs

* Add check for overwriting output file

* add unit and integrational tests

* fix test

* signed-off commit

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* support calling python command outside workspace

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* Illiyas typo fix

Co-authored-by: Ilya Trushkin <ilya.trushkin@intel.com>

* Update tests/openfl/interface/test_model_api.py

* Update tests/openfl/interface/test_model_api.py

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>
Co-authored-by: igor-davidyuk <igor.davidyuk@intel.com>
Co-authored-by: Ilya Trushkin <ilya.trushkin@intel.com>
Signed-off-by: Aleksandr Mokrov <aleksandr.mokrov@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load a trained pbuf model into general pytorch env
5 participants