Development friendly Kubeflow experience #5013

JoshZastrow · 2021-01-15T19:56:24Z

/kind feature

Why you need this feature:
Say I have a local package containing my application logic (i.e cleaning, feature generation, ML model training, etc..). This local package contains modules and functions used in my component.

I want to make changes to the application logic (i.e change a feature scaling method), then run my pipeline and 1) make sure the pipeline works or 2) see an improvement in my offline metrics.

My component image needs to have all the dependencies on the image, so this seems to mean that if I want to run my kubeflow pipeline with new code, I need to re-build and submit an image each time. This is a pretty slow process, and prevents us from wanting to make smaller components (better to develop pipelines in Python and run them as a bigger component via a CLI command).

I'm imagining one solution with a local Kubeflow instance, that has the component images pointing to locally built docker images that have the local application code mounted, you can get a much faster iteration cycle.

Is there a better way to develop faster with Kubeflow? It says it's experimentation friendly, but I haven't felt that from working with Kubeflow so far (it is nice that it has experiment management/tracking in the UI though!). I don't feel like I can swap my current experimentation workflow out for Kubeflow.

Maybe a user guide on developing locally could be a good solution? Something equivalent to pip install -e . for Kubeflow components would be great!

The text was updated successfully, but these errors were encountered:

davidspek · 2021-01-16T14:02:16Z

@JoshZastrow Just to be clear, are you talking about Kubeflow as a whole or pipelines specifically? In regards to pipelines, it is possible to create python function based components rather than needing to create images (you do need to have a base image that contains the necessary dependencies such as pytorch for example). https://www.kubeflow.org/docs/pipelines/sdk/python-function-components/

munagekar · 2021-01-18T08:56:40Z

https://github.com/kubeflow-kale/kale this might be useful.

JoshZastrow · 2021-01-19T00:17:10Z

Hi @davidspek , ah yes I should have been more specific--I am talking more about Kubeflow Pipelines.

Seems like even with python based functions, anything that gets imported needs to exist on the image.

For fast development--perhaps the way to go is make every single function in the application a component. This is just a little hard to adopt for an existing python project that already has its own packages, modules, functions and classes.

example:

src
   -preprocess
       -scalers.py
       -encoders.py
   -setup.py
components
   -preprocessing.py
pipeline.py

The pipeline would be built from components, but there's application code in src being actively developed. There could be many existing functions and classes in there that are used in the components. To test a change in src against the pipeline (for say a new experiment), I don't see a way of running the pipeline without building a new image that has a copy of the latest code change, then once it's uploaded to a docker registry, submitting a new pipeline that points to this version (not hard if we go with latest), then executing the pipeline on Kubeflow and seeing what the logs say.

@munagekar ah yeah I like Kale! This could be a very cool tool (and a big notebook user myself) but the devs on my team actually prefer to develop the pipeline in a .py script and keep logic in local modules. 🤷🏻

davidspek · 2021-01-19T10:29:18Z

/area pipelines
Ping @Bobgy. Seeing as this is related to Pipelines specifically maybe it can be moved to the kubeflow/pipelines repo.

munagekar · 2021-01-20T03:34:45Z

I don't see a way of running the pipeline without building a new image that has a copy of the latest code change, then once it's uploaded to a docker registry, submitting a new pipeline that points to this version (not hard if we go with latest), then executing the pipeline on Kubeflow and seeing what the logs say.

This is exactly what we implemented in my organization. We use git tags instead of latest.

Bobgy · 2021-01-20T11:08:44Z

Some documentation you can refer to: https://cloud.google.com/solutions/machine-learning/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build

There's a CI/CD pipeline needed to deploy the CT (continuous training) pipeline that runs in KFP.

bencwallace · 2021-01-20T14:11:35Z

Looks like this open PR would help with this: #4983

stale · 2021-06-03T20:50:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2022-04-28T17:59:44Z

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

JoshZastrow changed the title ~~Development friendly Kubeflow experient~~ Development friendly Kubeflow experience Jan 15, 2021

Bobgy transferred this issue from kubeflow/kubeflow Jan 20, 2021

k8s-ci-robot added the kind/feature label Jan 20, 2021

Bobgy added kind/question and removed kind/feature labels Jan 20, 2021

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 3, 2021

stale bot closed this as completed Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development friendly Kubeflow experience #5013

Development friendly Kubeflow experience #5013

JoshZastrow commented Jan 15, 2021

davidspek commented Jan 16, 2021

munagekar commented Jan 18, 2021 •

edited

Loading

JoshZastrow commented Jan 19, 2021

davidspek commented Jan 19, 2021

munagekar commented Jan 20, 2021

Bobgy commented Jan 20, 2021

bencwallace commented Jan 20, 2021

stale bot commented Jun 3, 2021

stale bot commented Apr 28, 2022

Development friendly Kubeflow experience #5013

Development friendly Kubeflow experience #5013

Comments

JoshZastrow commented Jan 15, 2021

davidspek commented Jan 16, 2021

munagekar commented Jan 18, 2021 • edited Loading

JoshZastrow commented Jan 19, 2021

davidspek commented Jan 19, 2021

munagekar commented Jan 20, 2021

Bobgy commented Jan 20, 2021

bencwallace commented Jan 20, 2021

stale bot commented Jun 3, 2021

stale bot commented Apr 28, 2022

munagekar commented Jan 18, 2021 •

edited

Loading