-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose Pipelines as CRD and enable to easy migration from Argo workflow #1132
Comments
I'm not sure this is needed. If we decide to replace Argo, then we'll create a new CRD.
I do not think KF Pipelines requires you to do that.
instead of writing the YAML manually. |
So, if I'd submit argo workflow, it will be picked up by pipelines immediatly? How, for example, will it save metrics? |
Hi inc0@, having a CRD for pipeline is being considered. We are planing to implement this in multiple steps:
|
To provide metrics the workflow task must have an output artifact called 'mlpipeline-metrics'.
You have to submit the workflow against the pipelines API. Note that it's not considered a supported mode of operation. It may break in future. |
@Ark-kun, having a CRD for pipeline is something that we are considering. Let's please keep this open. |
Adding to this, having a Pipelines CRD would also provide a path for multi-user pipelines, as Kubernetes CRDs have built-in authentication and authorization via the API Server, like any other Kubernetes Object. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen I think this is something we'd want to consider for the long term. |
Chiming in here, more background in this Slack thread. Our use case at Zillow is to be able to deploy monitoring alongside scheduled pipelines. We use Datadog internally and have created a K8s operator for creating Datadog Monitors (essentially alerts triggered by metrics over thresholds), it just reconciles the state of the resources with teh Datadog API. We would like to be able to use a standard This would also allow us to unify our CICD pipeline with KFServing. Essentially we have the same pattern where we generate a set of resource manifests using |
@alexlatchford for clarification, does the use case only applies to Sounds to me one time pipeline runs do not need a CRD interface. |
I think we'd ideally prefer to just use the same CICD pipeline regardless so I'd imagine we'd use the |
Is this something that is still possible? it would be nice to have pipeline CRDs to be able to integrate pipelines with GitOps without loosing all UI capabilities |
@vicaire as I understand steps 1 + 2 have been completed, are there still plans to introduce a standalone CRD? Having to rely on Python SDK and submitting files to Kubeflow API instead of Kubernetes API makes Kubeflow a really hard sell. In our case, dedicated CI/CD workflows need to be developed, we can't rely on any of the tooling (e.g. |
Currently, there's no plan to make pipeline a CRD. In fact, we are moving to make pipeline platform-agnostic. |
I have the same use case as kujon, rubenaranamorera & alexlatchford . We deploy things using a Flux based GitOps workflow. The lack of on option to declaratively define kubeflow pipelines as kubernetes resource objects that can be |
…l agent (kubeflow#1132) * rebase master branch and fix build error fix agent injector unit test add finalizer to trainedmodel to remove model from the configmap before deleting the trainedmodel change the kfserving sdk to use v1alpha2 address pr comment, fix naming and fix configmap mount add model agent annotations in predictor reconciler instead of the ksvc reconciler cmd/agent/main.go create an empty model.json file when init the model config fix storage.py to support mms model dir /mnt/models add a done channel to block watcher.Start() function so the agent container won't exit under common condition multi-model server don't need to download models from external storage during runtime mount model dir in model server container fix minio mock docker tag and api version fix the predictor so the podspec won't be overwritten Inject S3 credential to agent watch models.json file instead of the ..data folder delete trainedmodel if its parent inferenceservice does not exists make if agent should access S3 service using virtual bucket configurable fix a bug that the agent unload unchanged models fix unit tests and format add a build template for agent remove model server patch only create multiModelConfigMap name instead of an empty configmap in predictor reconciler rebase master to get the kubernetes 1.18 support use default namespace to run unit tests for trainedmodel * rebase master to use aws for ci * do not inject S3UseVirtualBucket env by default since it is not a standard s3 env * fix logging * remove setting resync period to 60s * set trainedmodel's owner reference * fix unit test * add ownerreference properly * print controller logs to debug e2e * remove extra comma * only create model config for InferenceService whose storageUri is nil * remove unnecessary log * clean up test dir after e2e test * keep argo log for 1 day * only reconcile model configmap when storageUri is nil * add finalizer in InferenceService controller so it will clean up related trainmodels when an InferenceService is deleted * debug e2e test * add a step after running e2e test to print controller logs * fmt code * remove the extra copy-artifact step * invode the e2e test post process in exit handler * log inferenceservice external resource clean up * make post-e2e-tests.sh executable * reduce CPU usage for batcher and explainer e2e test * change e2e test parallelism to 3 instead of 4 * create 3 nodes for CI * reduce resource required by aix explainer e2e test * only create model configmap when InferenceService storageUri is nil * downgrade trainedmodel to v1alpha1 * create 4 nodes for CI * add extra check for custom predictor * fix agent injection check * Fix MMS model config check * only inject agent and reconcile model config for sklearn and xgboost predictor * Remove model server patches Co-authored-by: Dan Sun <dsun20@bloomberg.net>
For k8s 1.25, a securityContext definition is needed for a pod. Add proper security context to pipelineloop controler and webhook Signed-off-by: Yihong Wang <yh.wang@ibm.com> Signed-off-by: Yihong Wang <yh.wang@ibm.com>
Currently to use pipelines you need to run python SDK, only that generates argo workflow underneath etc. I think this is very limiting because:
I propose creating new CRD that will be effectively Argo workflow with additional options.
For example
This would make transition to pipelines much easier as Operators are already well known pattern and it handles a lot of things for us, including RBAC multitenancy, API auth etc.
The text was updated successfully, but these errors were encountered: