-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 support in Kubeflow Pipelines #3405
Comments
@Jeffwan: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is deprecated. The supported method is to configure the artifact location in the There is also an option to use Minio gateway to proxy requests to the actual storage system (Google Cloud Storage or Amazon S3). WDYT? |
@Ark-kun artifact location in workflow-configmap works for us now. |
Here is the Minio doc: https://docs.minio.io/docs/minio-gateway-for-s3.html |
Update metadata writer service to support S3 artifacts
See alsohttps://kubeflow.slack.com/archives/CE10KS9M4/p1586408517128600 |
The frontend translates this to ReadArtifact backend calls. Regarding S3: |
@Ark-kun minio gateway seems an extra layer under minio? User need to use minio service with gateway to delegate request to S3. Is that correct? I think that solve some of the requirements and kind of easy but brings extra layer.
Yes. For sure, the problem is it's always minio here, if user want to use S3 to replace minio. Currently, it's not working well. |
I'm confused as well. If somebody could outline what steps need to be taken to support s3, I can help implementing them. But reading the comments here leaves me none the wiser. |
https://github.com/e2fyi/kubeflow-aws/tree/master/pipelines As for GCS backed, it's here: |
Yes. It's served as proxy. The advantage of using it is that all services (UX, Backend, Argo etc) only need to talk to one kind of service - in-cluster Minio. And the configuration (credentials etc) can be specified in one place. @IronPan or @numerology might have more information. |
@Ark-kun I checked gateway and run some tests. This is working fine. I can file a PR to aws overlay. The current question is some users don't like to use minio as gateway, do we want to support direct object store access? The PRs listed above aims to solve this problem. @rmgogogo e2fyi repo can not replace entire minio. Still needs some changes. |
Isn't minio suppose to be compatible to s3? I don't understand the need for the gateway. How do we authenticate clients connecting to the minio gateway? Does that require an additional layer of authentication? Or is this enforced by istio on a networking level? Not convinced this is better than granting individual components IAM permissions. |
@discordianfish Gateway is kind of easy in current manifest structure. KFP just need S3 permission and minio will becomes a proxy to S3. There's no need to make code changes everywhere to check protocol. If you see above PRs, It's kind of painful now. We also need to give credential or attach roles in every KFP component talks to S3. I agree this won't meet everyone's need, we definitely need low level object storage configuration. |
@Jeffwan What I don't understand is why we need to distinguish between minio and s3 at all, instead of just tracking the bucket name and the endpoint. The minio client would be configured with either auth credentials or fall back to the instance role. That should be the same config for all components, no? But okay, if people want to go down the gateway route: How do all components that need s3/minio access authenticate against the gateway? Not using auth here would be a no go for us. |
I see. It's not consistency across all components. KFP has lots of references has fixed protocol like
The only difference between minio and minio gateway with S3 is, user need to give credential with s3 policy to minio pod. instance profile will work as well. I have not checked with IRSA or Kube2IAM for fine grain control yet. |
Just want to add a note. One main reason why we moved to s3 instead of minio gateway (this was our original deployment) is because tensorboard doesn't work with minio. It only support gcs and s3. It is using boto3 instead of minio the last time I checked (maybe 6 months ago?). Other than that I think minio gateway is fine as long as the throughput is fine. But the concern is that u can't have fine grain rbac control with minio gateway. And it is confusing to our data scientists when they have to mix between s3 and minio (i.e. s3 for our data, minio for data generated within kfp). |
Yes, but why? The protocol part in that url is just for the client to know where to connect so. So for minio and s3, I suggest we always use the endpoint, bucket name and path to refer to an object. It shouldn't matter for the client whether this is s3 or minio. Does that make sense?
I understand that, but we need to authenticate access from kubeflow components to the minio gateway. If I understand you correctly, with minion gw everything that is able to reach minion will have access to all s3 content without auth. That's not acceptable for us. |
https://github.com/kubeflow/pipelines/pull/3531/files#diff-4e5a9c52bbed342532f527bae3aae002R15-R20
Right, that's what @eterna2 mention as well. Fine grain control doesn't work with minio gateway. |
Doesn't work yet. Minio version used by pipelines is currently still sitting at github.com/minio/minio-go v6.0.14+incompatible. I think this needs to be merged: #3549 |
Also this issue also needs to be resolved: minio/minio-js#841. As far as I can tell from piecing together the various PRs and issues on this topic we need both minio-go and minio-js to implement this and then we need the versions to be bumped on KFP's side |
thanks for the response. hopefully there is resolution to this soon. I know a lot of people would love to use IRSA for S3 to avid using credentials or third party K8s IAM providers... |
Hi everyone, for KFP v2 compatible mode, we've switched to use https://gocloud.dev/ to access S3. Can someone help me verify whether it supports your use-cases especially IRSA and how? |
Hey @Bobgy, I could help with this. Could you give me some more info on what you would like to verify exactly? I guess this will replace the Minio Go and JS clients that are currently in use? The link bit.ly/kfp-v2-compatible gives me a 404 error. |
Thanks @karlschriek! Yes, we'd like to replace MinIO Go with Go CDK (also KFP tasks in pipelines will use Go CDK to upload/download from storage). So it'll be helpful to verify whether Go CDK (as an alternative client for S3) can support the IRSA you are talking about. Re the 404 error, sorry I made a mistake, fixed the link. |
Ok, it looks promising at least: google/go-cloud#2773. IRSA works by attaching annotations to the ServiceAccounts that get attached to Pods. So all we would need to do is attach the right annotations to the ServiceAccounts for the pipelines api etc, as well as whatever sa is used for the pipeline jobs. I'll provide you with some small examples and also try out the Go SDK. |
Is there already an issue that tracks this? I guess the current issue is probably not the right place to continue this discussion. |
@karlschriek WDUT about #5656? |
I'm struggling to access my object storage bucket created using rook-ceph (s3) and would like to use this as my default artifact and metadata store instead of MinIO. Can anyone please share me what all files to be modified to use MinIO as s3 interface? I've tried updating
Also replace accesskey and secretkey in Any clue on this will be really helpful. |
@Jeffwan Did you have a PR or an example for using minio as a gateway to s3? I've been standing up Kubeflow on AWS and I'm a bit concerned about the number of places the patches that need to go in for S3. Instead of patching all these deployments I'd prefer to just patch the minio components and then be done with it. |
@vinayan3 Have you checked in this https://docs.min.io/docs/minio-gateway-for-s3.html? You can make changes in minio manifests to achieve that. minio gateway works with S3 out of box. |
Hi, Currently is the assume-role-identity sts (temporary access) supported for the workflow to upload into s3 and is there any reference docs for the same? By temporary access i mean with access key, secret key & security Token. |
Is there a time plan, when and in which release will this following feature be supported? @Jeffwan |
Hi everyone, please see the proposal/design for using IRSA/IAM roles to access S3 on #8502 and let us know your thoughts. For anyone looking to use S3 with Kubeflow pipelines(using IAM user credentials), please refer to the latest AWS distribution of Kubeflow docs - https://awslabs.github.io/kubeflow-manifests/ |
Hey all, I just wanted to share that deployKF uses S3 directly (no minio gateway), which also means that it comes with support for IRSA! It's going to be documented better soon, but these are the values you need to configure to use S3, happy to help anyone who has trouble getting it working. If you want, you can also read a bit more about deployKF in this comment: kubeflow/manifests#2451 (comment) |
Closing this issue. Looks like this was already implemented. /close |
@rimolive: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Create this parent issue to includes all known S3 problems in KFP and I also like to talk about more feature requests in this ticket.
Use cases
Replace Minio with S3 for Argo artifact store and KFP Pipeline store.
[] Manifest changes in Kubeflow/manifest and standalone kubeflow/pipeline/manifest to easily change from minio to S3
[] IRSA - bump minio-sdk-go version to support IRSA
[] IRSA - bump argo workflow version to make sure runner can persist artifact to S3
Relate issues and PRs
assume-web-identity-role
for AWS #3398UI
[x] KFP UI should be able to read source file in S3, for example
mlpipeline-ui-metadata -> minio://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz
[] KFP UI should be able to read artifact files in S3
mlpipeline-ui-metadata -> s3://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz
[] IRSA - Bump minio-js version to support IRSA
SDK
[x] Use can declare custom S3 artifact location inside the pipeline.
[x] User can apply AWS credentials to pipeline pods to get access to S3.
[x] User can specify service account for pipeline. (IRSA)
More examples
/kind improvement
/area frontend
/area backend
/cc @eterna2 @discordianfish
The text was updated successfully, but these errors were encountered: