-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): Introduce the token credentials interface. Part of #4683 #5287
feat(sdk): Introduce the token credentials interface. Part of #4683 #5287
Conversation
0ae9d39
to
ae48463
Compare
This does not block the release, but we should better start reviewing and reaching a consensus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of depending on serviceaccount projection very much, because the SDK would depend on external configuration for each pod it runs in.
I wonder if you ever considered using TokenRequest API directly, that will make the SDK usable anywhere. As long as the service account has RBAC permission of TokenRequest API, which seems to be a reasonable default for both KFP pipelines and notebooks.
elif credentials: | ||
token = credentials.get_token() | ||
config.refresh_api_key_hook = credentials.refresh_api_key_hook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this branch higher, probably only after existing_token. The other branches should be refactored to use the credentials interface afterwards.
We have and we've used it in the past. However, projected volumes are native to K8s and they don't require any API call. Also they are available for
We can definitely implement both and have a heuristic to use one or another by default (e.g.: If |
My understanding is that kubelet make the API calls for us if they are configured. So I don't see a big difference. The kubernetes>=12.0.0 requirement is a good point, maybe we can add an version detection error message in this case. We've already updated dependency range to allow 12.0.0 just recently.
That makes sense to me, can you come up with a plan? What do you like to implement first? nit: my personal opinion is to get rid of |
Thank you so much for implementing this @elikatsis.
/cc @yanniszark |
@@ -25,3 +25,4 @@ | |||
from ._config import * | |||
from ._local_client import LocalClient | |||
from ._runners import * | |||
from ._credentials import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The credentials seem Client-specific, so I'm not sure we should import them in the root kfp package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Ark-kun, I just saw this. Do you have some specific structure in mind?
We found out that a bug occurs when This happens when the path does not exist as well. It seems that This should be inline with #5287 (comment) |
@elikatsis with Kubeflow 1.3 manifests released, I think we will have more time to finish this. What do you think about separating the PR into 2?
with 1. merged, we can parallely implement what we prefer, e.g. I'd go with #5287 (review), because we want KFP to run as independently as possible. We do not like depending on pod defaults webhook. |
@Bobgy, hi there! Sorry for the late reply, I had full cycles and after that some public holidays. Sure I will split the PR in two per your suggestion and I'll ping you. |
ae48463
to
f99a4b7
Compare
Let's use this PR to introduce the abstract class for credentials. I updated the first message scratching off the items that we don't tackle in this PR. After this, we will PR the credentials using projected tokens (that is, tokens that find themselves in the container's filesystem). We can then discuss heuristics to choose between the implementation you propose (TokenRequest API) and the projected one, for the client to use some credentials by default for the in-cluster case. Feel free to suggest different structure. |
/assign @chensun |
@@ -0,0 +1,37 @@ | |||
# Copyright 2021 Arrikto Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the standard header: "Copyright 2021 The Kubeflow Authors"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, all Google LLC headers have been changed to "The Kubeflow Authors".
Not a blocker, if you can update existing Arrikto Inc headers also to "The Kubeflow Authors" in a separate PR, that would be even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chensun @Bobgy this is a great initiative and we plan on changing all the copyrights as well.
However I think something is missing from this effort: an AUTHORS
file. I've also left a comment in the issue: #5470 (comment)
Is it ok if we move forward like this and change the headers after we resolve this issue? (Or we have this file somewhere and I've missed it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, we can do it separately too
sdk/python/kfp/_credentials.py
Outdated
|
||
class TokenCredentials(object): | ||
|
||
__metaclass__ = abc.ABCMeta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__metaclass__= ...
is the Python2 style, and IIRC, it's actually ignored in Python3.
In Python3, you should use this instead:
class TokenCredentials(metaclass=abc.ABCMeta):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Thanks for catching this, it had totally slipped.
We need an abstract class here, not a metaclass. The proper way to define an abstract class, correct me if I'm wrong, is class TokenCredentials(abc.ABC)
, so let's do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, both syntax work the same for defining an abstract class. abc.ABC
is a helper class that is new in Python 3.4, which is fine here.
https://docs.python.org/3/library/abc.html#abc.ABC
sdk/python/kfp/_credentials.py
Outdated
] | ||
|
||
|
||
class TokenCredentials(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: What do you think about naming it TokenCredentialsProvider
or TokenCredentialsBase
or TokenCredentialsMeta
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing more options, WDUT about AuthProvider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the first suggestions TokenCredentialsBase
sounds good.
AuthProvider
is more of a whole backend service that performs authentication, rather than a client-side component that ensures the requests contain the required credentials to authenticate against the provider.
Let's go with TokenCredentialsBase
, but ofc feel free to elaborate!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinions, TokenCredentialsBase
SGTM
@@ -0,0 +1,37 @@ | |||
# Copyright 2021 Arrikto Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, all Google LLC headers have been changed to "The Kubeflow Authors".
Not a blocker, if you can update existing Arrikto Inc headers also to "The Kubeflow Authors" in a separate PR, that would be even better.
sdk/python/kfp/_credentials.py
Outdated
] | ||
|
||
|
||
class TokenCredentials(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing more options, WDUT about AuthProvider?
@abc.abstractmethod | ||
def get_token(self): | ||
raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we refresh api_key every time before sending a request, I feel like this method in interface is not necessary. It's enough having just refresh_api_key_hook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to rename that method, what do you think about "update_auth_token(api_config)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's enough having just refresh_api_key_hook.
I think removing the get_token()
method [in favor of a different unique method which unifies functionalities] is error prone.
We don't want users to be messing with the refresh_api_key_hook()
method (unless they have a strong reason to do so). What it does (setting the key for the authorization
header) is very specific and important for the functionality.
Users should only be declaring the logic with which the client can retrieve[/get/read/generate] the token. The rest is implementation detail (and following the codebase as described in my issue comment, the K8s client defines it strictly enough).
Plus, we have thought that this name is a good choice because the Configuation
object calls it that way itself (configuration.refresh_api_key_hook
).
Am I explaining it properly? Do the above make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want users to be messing with the refresh_api_key_hook() method (unless they have a strong reason to do so). What it does (setting the key for the authorization header) is very specific and important for the functionality.
I'm not fully confident to say so, e.g. https://www.kubeflow.org/docs/distributions/aws/pipeline/#authenticate-kubeflow-pipeline-using-sdk-inside-cluster needs a cookie to authenticate. Should that be avoided or sth potentially needed for other scenarios as well?
Rethinking about this, providing a default refresh_api_key_hook implementation while allowing people to override seems reasonable too, so I think I'm OK with this now.
The configuration object is auto-generated code, so we do not have a naming choice. We might consider change the auto-generator, so it's not necessarily a name we need to stick to.
Anyway, I feel like it's not a bad choice either.
/retest |
As presented in kubeflow#4683, in this commit we introduce a TokenCredentials abstract class which encapsulates the retrieval and refresh of token credentials. The reason we are using a class for the credentials is the fact that usually tokens are short-lived and the client needs to refresh them. All subclasses should define a 'get_token()' method responsible for fetching and refreshing (if needed) a token for authentication.
f99a4b7
to
a7ca540
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
leaving to @chensun for a final approval
@@ -0,0 +1,37 @@ | |||
# Copyright 2021 Arrikto Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, we can do it separately too
/lgtm Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chensun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description of your changes:
The changes of this PR and their rationale are described in detail in #5138 (comment).
In a nutshell:
TokenCredentials
abstract class, as described in sdk/client/auth - refactor GCP auth into a module #4683Introduce theServiceAccountTokenVolumeCredentials
class to authenticate using ServiceAccountTokens, which is part of Add authentication with ServiceAccountToken #5138Client
to (optionally) expect acredentials
objectIf the user provides no credentials and the client is running inside a pod, attempt to useServiceAccountTokenVolumeCredentials
/assign @Bobgy
/assign @elikatsis
/cc @yanniszark
/cc @StefanoFioravanzo
Checklist:
Do you want this pull request (PR) cherry-picked into the current release branch?
Learn more about cherry-picking updates into the release branch.