Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development friendly Kubeflow experience #5013

Closed
JoshZastrow opened this issue Jan 15, 2021 · 9 comments
Closed

Development friendly Kubeflow experience #5013

JoshZastrow opened this issue Jan 15, 2021 · 9 comments
Labels
kind/question lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@JoshZastrow
Copy link

/kind feature

Why you need this feature:
Say I have a local package containing my application logic (i.e cleaning, feature generation, ML model training, etc..). This local package contains modules and functions used in my component.

I want to make changes to the application logic (i.e change a feature scaling method), then run my pipeline and 1) make sure the pipeline works or 2) see an improvement in my offline metrics.

My component image needs to have all the dependencies on the image, so this seems to mean that if I want to run my kubeflow pipeline with new code, I need to re-build and submit an image each time. This is a pretty slow process, and prevents us from wanting to make smaller components (better to develop pipelines in Python and run them as a bigger component via a CLI command).

I'm imagining one solution with a local Kubeflow instance, that has the component images pointing to locally built docker images that have the local application code mounted, you can get a much faster iteration cycle.

Is there a better way to develop faster with Kubeflow? It says it's experimentation friendly, but I haven't felt that from working with Kubeflow so far (it is nice that it has experiment management/tracking in the UI though!). I don't feel like I can swap my current experimentation workflow out for Kubeflow.

Maybe a user guide on developing locally could be a good solution? Something equivalent to pip install -e . for Kubeflow components would be great!

@JoshZastrow JoshZastrow changed the title Development friendly Kubeflow experient Development friendly Kubeflow experience Jan 15, 2021
@davidspek
Copy link
Contributor

@JoshZastrow Just to be clear, are you talking about Kubeflow as a whole or pipelines specifically? In regards to pipelines, it is possible to create python function based components rather than needing to create images (you do need to have a base image that contains the necessary dependencies such as pytorch for example). https://www.kubeflow.org/docs/pipelines/sdk/python-function-components/

@munagekar
Copy link
Contributor

munagekar commented Jan 18, 2021

https://github.com/kubeflow-kale/kale this might be useful.

@JoshZastrow
Copy link
Author

Hi @davidspek , ah yes I should have been more specific--I am talking more about Kubeflow Pipelines.

Seems like even with python based functions, anything that gets imported needs to exist on the image.

For fast development--perhaps the way to go is make every single function in the application a component. This is just a little hard to adopt for an existing python project that already has its own packages, modules, functions and classes.

example:

src
   -preprocess
       -scalers.py
       -encoders.py
   -setup.py
components
   -preprocessing.py
pipeline.py

The pipeline would be built from components, but there's application code in src being actively developed. There could be many existing functions and classes in there that are used in the components. To test a change in src against the pipeline (for say a new experiment), I don't see a way of running the pipeline without building a new image that has a copy of the latest code change, then once it's uploaded to a docker registry, submitting a new pipeline that points to this version (not hard if we go with latest), then executing the pipeline on Kubeflow and seeing what the logs say.

@munagekar ah yeah I like Kale! This could be a very cool tool (and a big notebook user myself) but the devs on my team actually prefer to develop the pipeline in a .py script and keep logic in local modules. 🤷🏻

@davidspek
Copy link
Contributor

/area pipelines
Ping @Bobgy. Seeing as this is related to Pipelines specifically maybe it can be moved to the kubeflow/pipelines repo.

@munagekar
Copy link
Contributor

I don't see a way of running the pipeline without building a new image that has a copy of the latest code change, then once it's uploaded to a docker registry, submitting a new pipeline that points to this version (not hard if we go with latest), then executing the pipeline on Kubeflow and seeing what the logs say.

This is exactly what we implemented in my organization. We use git tags instead of latest.

@Bobgy Bobgy transferred this issue from kubeflow/kubeflow Jan 20, 2021
@Bobgy
Copy link
Contributor

Bobgy commented Jan 20, 2021

Some documentation you can refer to: https://cloud.google.com/solutions/machine-learning/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build

There's a CI/CD pipeline needed to deploy the CT (continuous training) pipeline that runs in KFP.

@bencwallace
Copy link
Contributor

Looks like this open PR would help with this: #4983

@stale
Copy link

stale bot commented Jun 3, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 3, 2021
@stale
Copy link

stale bot commented Apr 28, 2022

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
None yet
Development

No branches or pull requests

6 participants