Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlpipeline-ui-metadata html output not displaying on v0.7rc6 (worked on 0.6.2) #2501

Closed
schmidt-jake opened this issue Oct 25, 2019 · 21 comments

Comments

@schmidt-jake
Copy link

schmidt-jake commented Oct 25, 2019

My data validation component generates static html for display in the KFP UI. This html successfully visualizes in kubeflow v0.6.2 but is broken in v0.7.0rc6. The run output tab just displays a blank card:
Screen Shot 2019-10-25 at 4 14 11 PM
I have confirmed this is not a TFDV-specific issue. The same issue occurs for arbitrary HTML. I'm using Chrome to access the KFP UI. The only hint is that, opposed to v0.6.2, I'm not seeing any network traffic in Chrome to get the HTML from GCS, and inspecting the html elements on the KFP UI shows an iFrame with an empty body where the visualizations should be.

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@jlewi
Copy link
Contributor

jlewi commented Oct 28, 2019

@JakeTheWise transferring to the pipelines repo

/cc @jessiezcc

@jlewi jlewi transferred this issue from kubeflow/kubeflow Oct 28, 2019
@schmidt-jake
Copy link
Author

schmidt-jake commented Oct 28, 2019

@jlewi thanks!

I've confirmed this isn't a TFDV-related issue; I also tried displaying TFMA output as well as a bokeh plot (exported as HTML). The files were all written as static HTML to Google Cloud Storage. The pages display correctly if I download and view them in Chrome. Again, in KFP v0.6.2 this issue did not occur.

@schmidt-jake schmidt-jake changed the title TFDV html output not displaying on v0.7rc6 (worked on 0.6.2) mlpipeline-ui-metadata html output not displaying on v0.7rc6 (worked on 0.6.2) Oct 28, 2019
@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

/priority p0
/kind bug

Let me troubleshoot this.

@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

I tried to deploy a kubeflow 0.7, but I got blocked by this issue: kubeflow/kubeflow#4439

@Bobgy Bobgy added the blocked label Oct 31, 2019
@jlewi
Copy link
Contributor

jlewi commented Oct 31, 2019

@Bobgy I would suggest using IAP so you aren't blocked trying to debug and fix this error.

@Bobgy Bobgy removed the blocked label Oct 31, 2019
@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

@jlewi Thanks! I will use IAP to debug this.

@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

Hit the following error message instead when kfctl apply

spec.domains in body should be at most 63 chars long

My cluster name is too long.

Trying again

@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

kfctl apply ends with an error

INFO[0546] Downloading secret user-gcp-sa from namespace kubeflow  filename="gcp/gcp.go:1820"
INFO[0546] Creating secret user-gcp-sa to namespace kubeflow-remote-dev  filename="gcp/gcp.go:1825"
INFO[0546] Generating PodDefault in namespace kubeflow-remote-dev; APIVersion kubeflow.org/v1alpha1  filename="gcp/gcp.go:1755"
E1031 14:14:11.753767   23741 memcache.go:135] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request

However, it seems kubeflow can already be used. I will start taking a look.

@schmidt-jake
Copy link
Author

@Bobgy you are correct and apparently it's a harmless message — see here. kubeflow/kubeflow#4414

@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

@JakeTheWise Thanks for the info!

@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

I've been able to reproduce the issue.

The problem is minio couldn't find the file. I'm checking why that happened.
Correct: pipeline-ui node server couldn't find the file.

@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

I don't quite understand why it worked before in KF 0.6.

First, I guessed the issue was a permission issue. I tested patching the deployment to use user-gcp-sa credentials as GOOGLE_APPLICATION_CREDENTIALS for ml-pipeline-ui deployment. And then ml-pipeline-ui can fetch the file and return it to frontend. So the html file is rendered.

    spec:
      containers:
      - env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /etc/credentials/user-gcp-sa.json
        image: gcr.io/ml-pipeline/frontend:0.1.31
        name: ml-pipeline-ui
      volumes:
      - name: gcp-sa-token
        secret:
          defaultMode: 420
          secretName: user-gcp-sa

I added the above change to ml-pipeline-ui manifest to use user-gcp-sa credentials.

@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

When I was using https://www.kubeflow.org/docs/pipelines/standalone-deployment-gcp/ for test comparison. It was deployed in a cluster with full scope. So default credentials can also fetch those files and there's nothing wrong.

However, if the cluster doesn't have full scope, but access GCP resources by user-gcp-sa. I imagine it would break too.

@IronPan, do you know more context about this?

@Bobgy
Copy link
Contributor

Bobgy commented Oct 31, 2019

Made a PR: https://github.com/kubeflow/manifests/pull/594/files

Just realized the change is GCP specific, so it should probably be put in an overlay and combined later. What do you think about this fix? @jlewi @IronPan

@jlewi
Copy link
Contributor

jlewi commented Oct 31, 2019

What is pipelines GCP auth story?

With 0.7 we are moving to use workload identity per #1691. So the ideal solution would be to use a service account that is bound to the correct GCP permissions.

Here are possible options

Option 1 - Use an existing KSA already bound to a GSA

In the kubeflow namespace we create kubernetes service accounts

  • kf-user
  • kf-admin

So one solution would be for the pipelines UI to run with one of those service accounts

In which case you would need to make sure that any RBAC permissions needed by pipelines is also granted to those service accounts.

We use ClusterRole aggregation to aggregate application specific roles (e.g. roles needed by pipelines up to those roles).

Option 2 use pipelines KSA and add a GSA workload identity binding

You could continue to use the ml-pipeline-ui service account but add logic e.g. in kfctl to add the appropriate workload identity bindings

Option 3 Temporary fix for 0.7

As a quick fix for 0.7 we are still creating GCP secrets user-gcp-sa and admin-gcp-sa in Kubeflow namespace with 0.7 so you could continue to mount one of those secrets.

We want to remove the secrets in the next major release because it is much less secure than workload identity.

So pipelines team would need to commit to removing it in the next release and going with an option based on workload identity.

@Bobgy
Copy link
Contributor

Bobgy commented Nov 1, 2019

@jlewi Thanks for listing all the options I can take!

I'd go with option 3 for now because I'm not sure what KFP has decided on the auth story.
@gaoning777, do you know more about this?

@Bobgy
Copy link
Contributor

Bobgy commented Nov 1, 2019

Option 2 sounds like the final state we would want to reach. However, why would we need to touch kfctl for adding bindings? If we can write some configurations for that, then it would be awesome. I want to know if that's the direction we are heading?

@jlewi
Copy link
Contributor

jlewi commented Nov 4, 2019

Verification with GCP and IAP using

kubeflow/kubeflow - v0.7.0-rc.7-3-g8dbde9d8
kubeflow/manifests - v0.7.0-rc.2-19-g317c0d8

ml-pipeline-ui spec is below

  • GOOGLE_APPLICATION_CREDENTIALS is set

ml-pipeline-ui.yaml.txt

So marking this as fixed.

@jlewi jlewi closed this as completed Nov 4, 2019
@schmidt-jake
Copy link
Author

Thanks!!

@IronPan
Copy link
Member

IronPan commented Nov 5, 2019

Regarding GCP auth I agree with @jlewi that Pipeline should heading towards workload identity as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants