Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop GCP credentials env if user-gcp-sa secret is not present #2643

Closed
wants to merge 15 commits into from

Conversation

Bobgy
Copy link
Contributor

@Bobgy Bobgy commented Nov 22, 2019

This is part of #2589

@IronPan @jingzhang36
Can you help me review?

Behavior summary:

  1. When user's workflow uses user-gcp-sa secret
  2. Backend checks if the secret exists
  3. If the secret doesn't -> drop GOOGLE_APPLICATION_CREDENTIALS env and mark user-gcp-sa volume as optional for templates that mount user-gcp-sa.
  4. If the secret does exist -> do nothing
  5. If an error occurred when fetching the secret -> propagate the error above

Verified

  • When cluster's namespace doesn't have user-gcp-sa secret, this can drop env and mark the secret as optional. Therefore, tfx and xgboost samples run successfully.
  • When cluster's namespace have user-gcp-sa, this keeps env variables, and let pipelines run as normal.

TODO: add some unit tests
Done

TODO2: apply manifest changes here to Kubeflow too

/cc @rmgogogo
/assign @IronPan
/area backend


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign ironpan
You can assign the PR to them by writing /assign @ironpan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

}
if !someStepsHaveUserGcpSa {
// No steps used user-gcp-sa secret, no need to drop GOOGLE_APPLICATION_CREDENTIALS.
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that possible: no steps/components mounted secret but set the ADC env?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explain. We only handle the default case of using use_gcp_secret with all default arguments.

If user doesn't have this secret, but have ADC env, they may have mounted secret as a different name. So we shouldn't break them

@@ -13,6 +13,12 @@ rules:
- get
- list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's OK if no test for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so does MINIO.
I think MySQL and MINIO won't hit the createRun to allow you remove it.
https://github.com/kubeflow/pipelines/blob/master/manifests/gcp_marketplace/chart/kubeflow-pipelines/templates/minio.yaml#L46

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing them out. I prefer a separate one because this is complex enough.
Will do that shortly after this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, or feel free to leave it to me if multi-user is busy

@Bobgy
Copy link
Contributor Author

Bobgy commented Nov 25, 2019

/assign @jingzhang36
for early review

@@ -281,6 +399,11 @@ func (r *ResourceManager) CreateRun(apiRun *api.Run) (*model.RunDetail, error) {
}
}

err = r.dropUserGcpSaIfNotConfigured(workflow)
if err != nil {
return nil, util.NewInternalServerError(err, "Failed to verify whether GCP secret is configured")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask for review. Call out that this is probably something that would be controversial. I discussed with @jingzhang36 about this. We concluded that because only users whose workflow contains use_gcp_secret can possibly hit this error. It's clearer if the error is returned to user, instead of silently swallowed (probably with a warning message that no user will ever see).

someStepHasGcpSecretVolume = true
break
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here also can break
if someStepHasGcpSecretVolume is true, it also can break in outer "for".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would prefer writing no breaks here to keep it simple and easy to read. (because this cannot impact performance)

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

*overrideOptional = true
volume.Secret.Optional = overrideOptional
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, only remove the ENV Var is enough, but OK here you also do these clean up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, got your meaning to have more info for debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer leaving the secret optional here to give them a clue what happened, just in case they may hit this unexepectedly.

@rmgogogo
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot removed the lgtm label Nov 25, 2019
@rmgogogo
Copy link
Contributor

/lgtm

not sure whether you tested this CL via CLI deployment. We can do the test via creating a full scope cluster with "gcloud".

@IronPan
Copy link
Member

IronPan commented Nov 26, 2019

Why do we need this change? If you really just want to unblock the marketplace samples from being running, isn't it better just to fork the sample in mkp to not mounting the secret?

This change introduce "magical" behavior to the kfp backend. How do you plan to convey this behavior to the end user?

@IronPan
Copy link
Member

IronPan commented Nov 26, 2019

/hold

@Bobgy
Copy link
Contributor Author

Bobgy commented Nov 27, 2019

Synced with @IronPan, because I don't have bandwidth on finishing workload identity documentation for samples and standalone deployment. He prefers finding someone else to finish those, so that we don't need this PR.

@Bobgy
Copy link
Contributor Author

Bobgy commented Dec 18, 2019

/close
in favor of #2619

@k8s-ci-robot
Copy link
Contributor

@Bobgy: Closed this PR.

In response to this:

/close
in favor of #2619

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants