-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create kfp componet sdk #729
Conversation
# Load argo metadata at start of an OP, as pod might be deleted in case of preemption. | ||
pod = self._load_pod() | ||
if not pod or not pod.metadata or not pod.metadata.labels or not pod.metadata.annotations: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to silently ignore the error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the staging states related code and decide to go with stateless container and move to graph component once it's ready.
gs_prefix = 'gs://' | ||
if tmp_location.startswith(gs_prefix): | ||
tmp_location = tmp_location[len(gs_prefix):] | ||
splits = tmp_location.split('/', 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"splits" -> "parts"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is removed.
self._argo_node_name = re.sub(r'\s+\(\d\)', '', argo_node_name) | ||
|
||
def _load_staging_location(self): | ||
tmp_location = os.environ.get('KFP_TMP_LOCATION', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is tmp_location
?
I think it's better to pass all configuration options through constructor (__init__
) instead of a function talking directly to the operating system.
Not talking to the OS directly makes it easier to mock and test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is removed.
self.name | ||
)) | ||
|
||
def _load_staging_states(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this function used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is removed.
self.staging_states = json.loads(states_json) | ||
except ValueError as e: | ||
logging.error('Unable to decode staging states: {}. Error: {}.'.format(states_json, e)) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to ignore this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is removed.
self._stage_states() | ||
|
||
def _should_cancel(self): | ||
"""Checks argo's execution config deadline and decide whether the operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Container images should be independent of Argo. Is there a way to be non-Argo specific? For instance, it's fine if the container reacts to a signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argo doesn't support sending a cancel signal to the container. The only signal it will send is SIGTERM which itself can be triggered from many sources like pod preemption. The signal may be sent in the middle of a retry. The code can still run without argo env, and the cancel feature will be disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, thanks.
return | ||
|
||
def _load_k8s_client(self): | ||
config.load_incluster_config() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the container needs the cluster config? Does this require privilleges that we may not want to grant to individual containers in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is required before reading the pod metadata. Without the permission, argo sidecar won't work. I don't think we can remove the privileges unless we replace argo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, thanks.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vicaire The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vicaire The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Co-authored-by: Scott Lee <scottleehello@gmail.com>
…ubeflow#732) * Setup a kf-ci-dev namespace for manual sync'ing of Tekton pipelines. * This namespace is intended to allow for testing of changes to the pipelines without having to first check in the changes. Revert "Update Dockerfile.py3 (kubeflow#729)" This commit doesn't build. This reverts commit 0de5733. * Fix the entrypoint in Dockerfile.py3 for kubeflow/testing#684 * Rebuild the test worker image * Fix a bug in kf-ready-task; not properly substituting in KFNAME * Rehydrate * Update
Created a KFP component SDK python package which will be used inside component container.
The SDK will include utilities for:
The SDK will support both py 2 and 3 because dataflow SDK requires to use py 2.
This change is