Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Pipelines as CRD and enable to easy migration from Argo workflow #1132

Closed
inc0 opened this issue Apr 10, 2019 · 15 comments
Closed

Expose Pipelines as CRD and enable to easy migration from Argo workflow #1132

inc0 opened this issue Apr 10, 2019 · 15 comments

Comments

@inc0
Copy link

inc0 commented Apr 10, 2019

Currently to use pipelines you need to run python SDK, only that generates argo workflow underneath etc. I think this is very limiting because:

  1. not everyone uses python
  2. requires to learn whole new API and DSL
  3. Argo has a lot of examples already, shame that we can't tap to this knowledge source
  4. Argo can do much more than just data pipelines, you can learn one syntax and have it used for data, CI, CD etc

I propose creating new CRD that will be effectively Argo workflow with additional options.
For example

apiVersion: kubeflow.org
kind: Pipeline
metadata:
  generateName:  mlapp-
  labels:
    workflow: mlapp
spec:
# Add some useful pipeline specific data
  model_name: foobar
  model_version: 1
# This is just argo workflow spec
  entrypoint: mlapp
  templates:
  - name: mlapp
    dag:
      tasks:
      - name: preprocess
        template: preprocess

      - name: model1
        dependencies: [preprocess]
        template: train
        arguments:
          artifacts:
          - name: dataset
            from: "{{tasks.preprocess.outputs.artifacts.dataset}}"

  - name: preprocess
    container:
      image: myimage:latest
      name: preprocess
      command: ["python", "/src/preprocess.py"]
      env:
        - name: SOMEENV
          value: foobar
    outputs:
     artifacts:
     - name: dataset
       path: /data

  - name: train
    inputs:
      artifacts:
      - name: dataset
        path: /data
    outputs:
     artifacts:
     - name: model
       path: /output
    container:
      image: myimage:latest
      name: trainer
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
      command: ["python", "/src/train.py"]

This would make transition to pipelines much easier as Operators are already well known pattern and it handles a lot of things for us, including RBAC multitenancy, API auth etc.

@Ark-kun Ark-kun self-assigned this Apr 10, 2019
@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 10, 2019

I'm not sure this is needed.
Currently KF Pipelines use Argo Workflow CRD without changes. Pipelines do not extend it - there are no extra pipeline-specific fields.

If we decide to replace Argo, then we'll create a new CRD.

not everyone uses python
requires to learn whole new API and DSL

I do not think KF Pipelines requires you to do that.
Pipelines Python SDK just allows some people to write

preprocess = load_component(...)
train = load_component(...)

@pipeline
def mlapp():
    train(preprocess(train_set).output)

instead of writing the YAML manually.

@inc0
Copy link
Author

inc0 commented Apr 11, 2019

So, if I'd submit argo workflow, it will be picked up by pipelines immediatly? How, for example, will it save metrics?

@vicaire
Copy link
Contributor

vicaire commented Apr 11, 2019

Hi inc0@, having a CRD for pipeline is being considered. We are planing to implement this in multiple steps:

  • First, we will create a pipeline spec that will combine and Argo workflow + additional data needed for ML pipelines.
  • Initially, this spec will be processed by the pipeline API server and turned into an Argo workflow.
  • Later on, we could turn this pipeline spec into a standalone CRD.
  • The long term expectation is that the pipeline CRD will let us combine multiple orchestration CRDs useful for ML (Argo workflow, HP tuning, etc.) and let users specify additional, optional, ML metadata.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 12, 2019

How, for example, will it save metrics?

To provide metrics the workflow task must have an output artifact called 'mlpipeline-metrics'.

So, if I'd submit argo workflow, it will be picked up by pipelines immediatly?

You have to submit the workflow against the pipelines API.
You can use either python client (kfp.Client(...).run_pipeline(...)) or CLI.
https://github.com/kubeflow/pipelines/tree/master/backend/src/cmd/ml

Note that it's not considered a supported mode of operation. It may break in future.

@Ark-kun Ark-kun closed this as completed Apr 12, 2019
@vicaire
Copy link
Contributor

vicaire commented Apr 12, 2019

@Ark-kun, having a CRD for pipeline is something that we are considering. Let's please keep this open.

@vicaire vicaire reopened this Apr 12, 2019
@vicaire vicaire assigned IronPan and unassigned vicaire Jul 16, 2019
@yanniszark
Copy link
Contributor

Adding to this, having a Pipelines CRD would also provide a path for multi-user pipelines, as Kubernetes CRDs have built-in authentication and authorization via the API Server, like any other Kubernetes Object.
As such, maybe there is some overlap with #1223

@stale
Copy link

stale bot commented Jun 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 25, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Jun 26, 2020

/lifecycle frozen

I think this is something we'd want to consider for the long term.

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels Jun 26, 2020
@alexlatchford
Copy link
Contributor

Chiming in here, more background in this Slack thread.

Our use case at Zillow is to be able to deploy monitoring alongside scheduled pipelines. We use Datadog internally and have created a K8s operator for creating Datadog Monitors (essentially alerts triggered by metrics over thresholds), it just reconciles the state of the resources with teh Datadog API.

We would like to be able to use a standard kubectl apply (or better a kubectl apply -k with kustomize) to deploy both a ScheduledWorkflow CRD, see these samples alongside these custom DatadogMonitor CRD resources. This is an extensible pattern and in teh future we are planning to produce a Datadog Dashboards operator so we could dynamically create dashboards on a per ScheduledWorkflow basis (useful for defining and monitoring SLOs for instance).

This would also allow us to unify our CICD pipeline with KFServing. Essentially we have the same pattern where we generate a set of resource manifests using kustomize and in that case it's an InferenceService + a set of DatadogMonitors. As we have an underlying core K8s team they already have CICD pipelines for running kubectl apply -k super easily internally so instead of the custom CICD pipelines we need to maintain atop the kfp CLI/SDK tooling the current public interfaces KFP exposes this would allow us to align wholly with the rest of our company reducing maintenance overheads!

@Bobgy
Copy link
Contributor

Bobgy commented Feb 3, 2021

@alexlatchford for clarification, does the use case only applies to ScheduledWorkflow?

Sounds to me one time pipeline runs do not need a CRD interface.

@alexlatchford
Copy link
Contributor

I think we'd ideally prefer to just use the same CICD pipeline regardless so I'd imagine we'd use the ScheduledWorkflow in this mode just to unify the deployment process.

@rubenaranamorera
Copy link

Is this something that is still possible? it would be nice to have pipeline CRDs to be able to integrate pipelines with GitOps without loosing all UI capabilities

@kujon
Copy link

kujon commented Oct 6, 2021

  • First, we will create a pipeline spec that will combine and Argo workflow + additional data needed for ML pipelines.
  • Initially, this spec will be processed by the pipeline API server and turned into an Argo workflow.
  • Later on, we could turn this pipeline spec into a standalone CRD.
  • The long term expectation is that the pipeline CRD will let us combine multiple orchestration CRDs useful for ML (Argo workflow, HP tuning, etc.) and let users specify additional, optional, ML metadata.

@vicaire as I understand steps 1 + 2 have been completed, are there still plans to introduce a standalone CRD? Having to rely on Python SDK and submitting files to Kubeflow API instead of Kubernetes API makes Kubeflow a really hard sell. In our case, dedicated CI/CD workflows need to be developed, we can't rely on any of the tooling (e.g. helm-secrets) that works virtually with any other thing deployed onto Kubernetes too.

@chensun
Copy link
Member

chensun commented Jan 18, 2023

Currently, there's no plan to make pipeline a CRD. In fact, we are moving to make pipeline platform-agnostic.

@chensun chensun closed this as completed Jan 18, 2023
@laurence-hudson-mindfoundry

I have the same use case as kujon, rubenaranamorera & alexlatchford . We deploy things using a Flux based GitOps workflow. The lack of on option to declaratively define kubeflow pipelines as kubernetes resource objects that can be kubectl apply'ed is a pain, and seems like a departure from K8s norms. It also seems inconsistent with other KF components like Kserve, were your have InferenceService resource objects etc.

magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this issue Oct 22, 2023
…l agent (kubeflow#1132)

* rebase master branch and fix build error

fix agent injector unit test

add finalizer to trainedmodel to remove model from the configmap before deleting the trainedmodel

change the kfserving sdk to use v1alpha2

address pr comment, fix naming and fix configmap mount

add model agent annotations in predictor reconciler instead of the ksvc reconciler

cmd/agent/main.go

create an empty model.json file when init the model config

fix storage.py to support mms model dir /mnt/models

add a done channel to block watcher.Start() function so the agent container won't exit under common condition

multi-model server don't need to download models from external storage during runtime

mount model dir in model server container

fix minio mock docker tag and api version

fix the predictor so the podspec won't be overwritten

Inject S3 credential to agent

watch models.json file instead of the ..data folder

delete trainedmodel if its parent inferenceservice does not exists

make if agent should access S3 service using virtual bucket configurable

fix a bug that the agent unload unchanged models

fix unit tests and format

add a build template for agent

remove model server patch

only create multiModelConfigMap name instead of an empty configmap in predictor reconciler

rebase master to get the kubernetes 1.18 support

use default namespace to run unit tests for trainedmodel

* rebase master to use aws for ci

* do not inject S3UseVirtualBucket env by default since it is not a standard s3 env

* fix logging

* remove setting resync period to 60s

* set trainedmodel's owner reference

* fix unit test

* add ownerreference properly

* print controller logs to debug e2e

* remove extra comma

* only create model config for InferenceService whose storageUri is nil

* remove unnecessary log

* clean up test dir after e2e test

* keep argo log for 1 day

* only reconcile model configmap when storageUri is nil

* add finalizer in InferenceService controller so it will clean up related trainmodels when an InferenceService is deleted

* debug e2e test

* add a step after running e2e test to print controller logs

* fmt code

* remove the extra copy-artifact step

* invode the e2e test post process in exit handler

* log inferenceservice external resource clean up

* make post-e2e-tests.sh executable

* reduce CPU usage for batcher and explainer e2e test

* change e2e test parallelism to 3 instead of 4

* create 3 nodes for CI

* reduce resource required by aix explainer e2e test

* only create model configmap when InferenceService storageUri is nil

* downgrade trainedmodel to v1alpha1

* create 4 nodes for CI

* add extra check for custom predictor

* fix agent injection check

* Fix MMS model config check

* only inject agent and reconcile model config for sklearn and xgboost predictor

* Remove model server patches

Co-authored-by: Dan Sun <dsun20@bloomberg.net>
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this issue Mar 11, 2024
For k8s 1.25, a securityContext definition is needed for a pod.
Add proper security context to pipelineloop controler and webhook

Signed-off-by: Yihong Wang <yh.wang@ibm.com>

Signed-off-by: Yihong Wang <yh.wang@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests