Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create kfp componet sdk #729

Merged
merged 7 commits into from
Jan 29, 2019
Merged

Create kfp componet sdk #729

merged 7 commits into from
Jan 29, 2019

Conversation

hongye-sun
Copy link
Contributor

@hongye-sun hongye-sun commented Jan 23, 2019

Created a KFP component SDK python package which will be used inside component container.

The SDK will include utilities for:

  1. Resume from failure states
  2. Cancellation handler
  3. [Future PR] Metadata writer
  4. [Future PR] Dataflow inline runner wrapper

The SDK will support both py 2 and 3 because dataflow SDK requires to use py 2.


This change is Reviewable

# Load argo metadata at start of an OP, as pod might be deleted in case of preemption.
pod = self._load_pod()
if not pod or not pod.metadata or not pod.metadata.labels or not pod.metadata.annotations:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to silently ignore the error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the staging states related code and decide to go with stateless container and move to graph component once it's ready.

gs_prefix = 'gs://'
if tmp_location.startswith(gs_prefix):
tmp_location = tmp_location[len(gs_prefix):]
splits = tmp_location.split('/', 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"splits" -> "parts"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is removed.

self._argo_node_name = re.sub(r'\s+\(\d\)', '', argo_node_name)

def _load_staging_location(self):
tmp_location = os.environ.get('KFP_TMP_LOCATION', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is tmp_location?
I think it's better to pass all configuration options through constructor (__init__) instead of a function talking directly to the operating system.
Not talking to the OS directly makes it easier to mock and test code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is removed.

self.name
))

def _load_staging_states(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this function used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is removed.

self.staging_states = json.loads(states_json)
except ValueError as e:
logging.error('Unable to decode staging states: {}. Error: {}.'.format(states_json, e))
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to ignore this error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is removed.

self._stage_states()

def _should_cancel(self):
"""Checks argo's execution config deadline and decide whether the operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Container images should be independent of Argo. Is there a way to be non-Argo specific? For instance, it's fine if the container reacts to a signal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argo doesn't support sending a cancel signal to the container. The only signal it will send is SIGTERM which itself can be triggered from many sources like pod preemption. The signal may be sent in the middle of a retry. The code can still run without argo env, and the cancel feature will be disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, thanks.

return

def _load_k8s_client(self):
config.load_incluster_config()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the container needs the cluster config? Does this require privilleges that we may not want to grant to individual containers in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required before reading the pod metadata. Without the permission, argo sidecar won't work. I don't think we can remove the privileges unless we replace argo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, thanks.

@hongye-sun
Copy link
Contributor Author

Friendly ping. @Ark-kun and @vicaire, do you have more comments about the PR?

@vicaire
Copy link
Contributor

vicaire commented Jan 29, 2019

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vicaire

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vicaire

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 752256c into kubeflow:master Jan 29, 2019
Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023
Co-authored-by: Scott Lee <scottleehello@gmail.com>
Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023
…ubeflow#732)

* Setup a kf-ci-dev namespace for manual sync'ing of Tekton pipelines.

* This namespace is intended to allow for testing of changes to the pipelines
  without having to first check in the changes.

Revert "Update Dockerfile.py3 (kubeflow#729)"
  This commit doesn't build.
This reverts commit 0de5733.

* Fix the entrypoint in Dockerfile.py3 for kubeflow/testing#684
* Rebuild the test worker image

* Fix a bug in kf-ready-task; not properly substituting in KFNAME

* Rehydrate

* Update
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants