Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kubeflow tensorboard viz and fix tensorflow file IO for cloud back-ends #447

Merged
merged 23 commits into from
Mar 11, 2022

Conversation

stefannica
Copy link
Contributor

@stefannica stefannica commented Mar 7, 2022

Describe changes

Enable the tensorboard visualization for Kubeflow and fixes the tensorflow file IO support when shared cloud storage like AWS S3 is used.

  • fix the kubeflow container entrypoint to consider ZenML ModelArtifacts in addition to TFX ModelRun types when enabling the Tensorboard view
  • when tensorflow is installed, TFX activates the tensorflow file IO plugin with a priority higher than the ZenML IO plugin. Use a higher priority for the ZenML file IO plugin
  • the 2.6 tensorflow release moves the IO support into a separate python package tensorflow_io that must be installed and imported explicitly for the S3/GCP/Azure FS filesystem schemes to be supported
  • fix the Kubeflow Metadata Store implementation
  • suppress printing locals in rich trace logs if not in debug mode to reduce verbosity
  • implement a TensorboardService to track and manage locally running Tensorboard daemons
  • implement a step TensorboardVisualizer that starts a local Tensorboard daemon to visualize the entire history of a model logged by a step
  • change the kubeflow example to use Tensorflow instead of scikit-learn and make use of Tensorboard visualizations

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • If I have added an integration, I have updated the integrations table.
  • I have added tests to cover my changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

htahir1 and others added 7 commits March 7, 2022 13:58
* use fixed tensorflow version
* reorganize code structure
* when tensorflow is installed, TFX activates its file IO plugin
with a priority higher than the ZenML IO plugin
* the 2.6 tensorflow release moves the IO support into a separate
python package (tensorflow_io) that must be installed and imported
explicitly for the S3/GCP/Azure FS schemes to be supported
@stefannica stefannica force-pushed the feature/ENG-496-add-tensorboard branch from 5cbcf49 to a2ff7cf Compare March 7, 2022 13:00
@stefannica stefannica marked this pull request as ready for review March 7, 2022 13:02
@stefannica stefannica requested review from schustmi and htahir1 March 7, 2022 13:02
@stefannica stefannica added the internal To filter out internal PRs and issues label Mar 7, 2022
@stefannica stefannica force-pushed the feature/ENG-496-add-tensorboard branch from a2ff7cf to dfebffa Compare March 7, 2022 13:05
Copy link
Contributor

@htahir1 htahir1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this over Stefan! Really like it!

I would have some suggestions:

  • Potentially adding this example as a file in the kubeflow example to showcase the tensorboard integration. I dont believe this deserves an extra example unless we have a native tensorboard integration that works in a magical way for different configurations
  • Adding a picture of the 'Launch Tensorboard' button on the Kubeflow UI for users to understand the true value is when this is done on Kubeflow. This can be put on the examples/kubeflow/README.md in a subheading where we explain the kubeflow-tensorboard integration we just enabled
  • Adding in the limitations of Kubeflow (e.g. with S3) as you decribed in the JIRA ticket
  • Potentially adding a 'TensorboardVisualizer' and showcasing launching tensorboard in a post-execution workflow. This should be trivial but exemplrary

I can also implement some of the above if you give me the go-ahead. Otherwise, its its clear I would be glad to review these changes! Thanks again for doing the deep dive!

* use Kubeflow volume metadata support [1] to mount the local artifact
store as a hostpath volume in the Kubeflow UI and Tensorboard pods

[1] https://github.com/kubeflow/pipelines/blob/master/docs/config/volume-support.md
@stefannica
Copy link
Contributor Author

Thanks @htahir1 , I hope I've addressed all your suggestions with the last commit (see inline).

Thank you for taking this over Stefan! Really like it!

I would have some suggestions:

  • Potentially adding this example as a file in the kubeflow example to showcase the tensorboard integration. I dont believe this deserves an extra example unless we have a native tensorboard integration that works in a magical way for different configurations

I merged the Tensorboard and Kubeflow examples. This basically meant switching the existing Kubeflow example from sklearn to tensorflow and adding some post-execution visualizations.

  • Adding a picture of the 'Launch Tensorboard' button on the Kubeflow UI for users to understand the true value is when this is done on Kubeflow. This can be put on the examples/kubeflow/README.md in a subheading where we explain the kubeflow-tensorboard integration we just enabled

Done. There are new pics and new information regarding Tensorboard in that README.md file now.

  • Adding in the limitations of Kubeflow (e.g. with S3) as you decribed in the JIRA ticket

No need. I fixed all the limitations :)

  • Potentially adding a 'TensorboardVisualizer' and showcasing launching tensorboard in a post-execution workflow. This should be trivial but exemplrary

You're going to love this: I implemented a TensorboardService to wrap around the Tensorboard local server, and a visualizer that makes it easy to start/stop Tensorboard services. This goes along with the idea that Services are for more than just model serving. The visualizer also works with Jupyter notebooks, which reminds me: I also updated the notebook in the kubeflow example to reflect this.

@stefannica stefannica force-pushed the feature/ENG-496-add-tensorboard branch from 402cd86 to 86653c0 Compare March 8, 2022 21:26
…rd examples

* Tensorboard service to track Tensorboard daemons running locally
* Tensorboard visualizer to start Tensorboard service for a pipeline
step
* switched Kubeflow example to use Tensorflow instead of sklearn
* add more pics and info about the Tensorboard service and the Kubeflow
Tensorboard UI
@stefannica stefannica force-pushed the feature/ENG-496-add-tensorboard branch from 86653c0 to 7010889 Compare March 8, 2022 22:14
@stefannica stefannica force-pushed the feature/ENG-496-add-tensorboard branch from 61fbeec to 03f04f1 Compare March 9, 2022 10:20
Copy link
Contributor

@schustmi schustmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just found a few typos and a tiny amount of duplicated code, other than that good to go

* removed duplicated code in Tensorboard visualizer
@stefannica stefannica force-pushed the feature/ENG-496-add-tensorboard branch from 03f04f1 to 2e772cb Compare March 9, 2022 14:05
…_utils.py

Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>
…_utils.py

Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>
Copy link
Contributor

@htahir1 htahir1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you kidding here!!!! Whatttt the heck just happened. AMAZING WORK! I absolutely love the new Tensorboard service and can only be astonished at the effort put in here. Kudos to you and LGTM!!!!

Copy link
Contributor

@htahir1 htahir1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait I just noticed one TINY problem. No updates to the docs. Would really love to have some of this stuff on the docs, including the service you implemented

@htahir1
Copy link
Contributor

htahir1 commented Mar 10, 2022

I have made the following changes to this branch.

  • Added docs
  • Added the following error to make sure kubeflow orchestrator does not fail non gracefully
    image

@stefannica and @schustmi please do take a look

Copy link
Contributor Author

@stefannica stefannica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job with the docs, I like it.

examples/kubeflow/README.md Show resolved Hide resolved
@schustmi
Copy link
Contributor

@htahir1 The only thing i'd change is to not use an AssertionError. Maybe a RuntimeError instead?

@htahir1 htahir1 merged commit 74a0bc8 into develop Mar 11, 2022
@htahir1 htahir1 deleted the feature/ENG-496-add-tensorboard branch March 11, 2022 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal To filter out internal PRs and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants