New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Create kfp componet sdk #729

Merged

k8s-ci-robot merged 7 commits into kubeflow:master from hongye-sun:sdk

Jan 29, 2019

Contributor

hongye-sun commented Jan 23, 2019 •

edited

Loading

Created a KFP component SDK python package which will be used inside component container.

The SDK will include utilities for:

Resume from failure states
Cancellation handler
[Future PR] Metadata writer
[Future PR] Dataflow inline runner wrapper

The SDK will support both py 2 and 3 because dataflow SDK requires to use py 2.

This change is


          Create kfp componet sdk

726dba0

hongye-sun requested review from Ark-kun, gaoning777 and qimingj

January 23, 2019 20:01

k8s-ci-robot requested review from vicaire and IronPan

January 23, 2019 20:01

k8s-ci-robot added size/L labels

hongye-sun added 4 commits

January 23, 2019 22:03


          Add doc comments and rename pod name envir

4f51917


          Add copyright header and use module init to expose public interface

01ee331


          move @patch to class to avoid redundant code

641a4eb


          add k8s client

c459969

Ark-kun reviewed

View reviewed changes

component_sdk/python/kfp_component/_base_op.py Outdated

+                      # Load argo metadata at start of an OP, as pod might be deleted in case of preemption.
+                      pod = self._load_pod()
+                      if not pod or not pod.metadata or not pod.metadata.labels or not pod.metadata.annotations:
+                          return

Contributor

Ark-kun Jan 25, 2019

Is it OK to silently ignore the error here?

Contributor Author

hongye-sun Jan 25, 2019

I removed the staging states related code and decide to go with stateless container and move to graph component once it's ready.

Ark-kun reviewed

View reviewed changes

component_sdk/python/kfp_component/_base_op.py Outdated

+                      gs_prefix = 'gs://'
+                      if tmp_location.startswith(gs_prefix):
+                          tmp_location = tmp_location[len(gs_prefix):]
+                      splits = tmp_location.split('/', 1)

Contributor

Ark-kun Jan 25, 2019

"splits" -> "parts"?

Contributor Author

hongye-sun Jan 25, 2019

This code is removed.

Ark-kun reviewed

View reviewed changes

component_sdk/python/kfp_component/_base_op.py Outdated

+                          self._argo_node_name = re.sub(r'\s+\(\d\)', '', argo_node_name)
+                  def _load_staging_location(self):
+                      tmp_location = os.environ.get('KFP_TMP_LOCATION', None)

Contributor

Ark-kun Jan 25, 2019

What is tmp_location?
I think it's better to pass all configuration options through constructor (__init__) instead of a function talking directly to the operating system.
Not talking to the OS directly makes it easier to mock and test code.

Contributor Author

hongye-sun Jan 25, 2019

This code is removed.

Ark-kun reviewed

View reviewed changes

component_sdk/python/kfp_component/_base_op.py Outdated

+                          self.name
+                      ))
+                  def _load_staging_states(self):

Contributor

Ark-kun Jan 25, 2019

How is this function used?

Contributor Author

hongye-sun Jan 25, 2019

This code is removed.

Ark-kun reviewed

View reviewed changes

component_sdk/python/kfp_component/_base_op.py Outdated

+                          self.staging_states = json.loads(states_json)
+                      except ValueError as e:
+                          logging.error('Unable to decode staging states: {}. Error: {}.'.format(states_json, e))
+                          return

Contributor

Ark-kun Jan 25, 2019

Is it OK to ignore this error?

Contributor Author

hongye-sun Jan 25, 2019

This code is removed.

vicaire reviewed

View reviewed changes

component_sdk/python/kfp_component/_base_op.py

+                          self._stage_states()
+                  def _should_cancel(self):
+                      """Checks argo's execution config deadline and decide whether the operation

Contributor

vicaire Jan 25, 2019

Container images should be independent of Argo. Is there a way to be non-Argo specific? For instance, it's fine if the container reacts to a signal.

Contributor Author

hongye-sun Jan 25, 2019

Argo doesn't support sending a cancel signal to the container. The only signal it will send is SIGTERM which itself can be triggered from many sources like pod preemption. The signal may be sent in the middle of a retry. The code can still run without argo env, and the cancel feature will be disabled.

Contributor

vicaire Jan 29, 2019

SG, thanks.

component_sdk/python/kfp_component/_base_op.py Outdated

+                          return
+                  def _load_k8s_client(self):
+                      config.load_incluster_config()

Contributor

vicaire Jan 25, 2019

Why does the container needs the cluster config? Does this require privilleges that we may not want to grant to individual containers in the future?

Contributor Author

hongye-sun Jan 25, 2019

It is required before reading the pod metadata. Without the permission, argo sidecar won't work. I don't think we can remove the privileges unless we replace argo.

Contributor

vicaire Jan 29, 2019

SG, thanks.

hongye-sun added 2 commits

January 25, 2019 09:43


          remote stage states and tmp location.

ba0cd5d


          return value from execute to make test easier.

c27c641

Contributor Author

hongye-sun commented Jan 28, 2019

Friendly ping. @Ark-kun and @vicaire, do you have more comments about the PR?

hongye-sun mentioned this pull request

ML Engine Component Operations (Part 1) #746

Closed

Contributor

vicaire commented Jan 29, 2019

/lgtm
/approve

k8s-ci-robot assigned vicaire

k8s-ci-robot added the lgtm label

Contributor

k8s-ci-robot commented Jan 29, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vicaire

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vicaire]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment

Contributor

k8s-ci-robot commented Jan 29, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vicaire

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vicaire]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the approved label

k8s-ci-robot merged commit 752256c into kubeflow:master

Linchin pushed a commit to Linchin/pipelines that referenced this pull request


          Update Dockerfile.py3 (kubeflow#729)

0de5733

Co-authored-by: Scott Lee <scottleehello@gmail.com>

Linchin pushed a commit to Linchin/pipelines that referenced this pull request


          Setup a kf-ci-dev namespace for manual sync'ing of Tekton pipelines. (k…

…ubeflow#732)

* Setup a kf-ci-dev namespace for manual sync'ing of Tekton pipelines.

* This namespace is intended to allow for testing of changes to the pipelines
  without having to first check in the changes.

Revert "Update Dockerfile.py3 (kubeflow#729)"
  This commit doesn't build.
This reverts commit 0de5733.

* Fix the entrypoint in Dockerfile.py3 for kubeflow/testing#684
* Rebuild the test worker image

* Fix a bug in kf-ready-task; not properly substituting in KFNAME

* Rehydrate

* Update

magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request


          Fix pytorch gpu image tag (kubeflow#729)

122afef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Ark-kun Ark-kun left review comments

vicaire vicaire left review comments

gaoning777 Awaiting requested review from gaoning777

qimingj Awaiting requested review from qimingj

IronPan Awaiting requested review from IronPan

Labels

approved lgtm size/L