SDK - Components - Making whole TaskSpec available to the container and graph handlers #3447

Ark-kun · 2020-04-06T09:26:49Z

This is a refactoring PR.
The main goal of the PR is to make _components._container_task_constructor receive TaskSpec.execution_options, is_enabled and any options that might be added in the future. Otherwise, the customizations of tasks in a graph component are lost.

This PR partially reverses the previous refactoring which switched away from TaskSpec usage: (task_spec) to (component_spec, arguments, component_ref).
The interface for _components._container_task_constructor now changes from (component_spec, arguments, component_ref) to (task_spec, arguments).

The reason is that task_spec has additional attribues (execution_options) that should be passed in.
It looks weird to pass arguments separately (as task_spec can already hold arguments), but the reason for this is that the passed arguments may have types that are incompatible with TaskSpec.arguments. So the arguments are passed separately.
The interface is private, so it's fine to make a breaking change here as we control all implementations.

kubeflow-bot · 2020-04-06T09:26:56Z

This change is

numerology · 2020-04-06T22:42:02Z

sdk/python/kfp/components/_components.py

+
+    This function provides a reference implementation for the _default_container_task_constructor that returns TaskSpec objects.
+    The only reference-type arguments that TaskSpec can hold are TaskOutputArgument and GraphInputArgument.
+    When bridging with other frameworks, an alternative implementation should be provided that can process reference-type arguments that are native to that framework.


Do you mind elaborating on what other frameworks might be?

By other framework I mean any orchestration framework whose DSL is not build on top of TaskSpec and TaskOutputReference.
The main examples are:

KFP: Uses ContainerOp and PipelineParam. Implementation of _container_task_constructor for KFP: link

TFX: Uses BaseComponent and Channel. Implementation of _container_task_constructor for TFX: link

Airflow: Uses Operator. No implementations.

The reference implementation is used for framework-agnostic testing and also for creation of graph components.

Thanks.

Just a side note. IIUC the proposed TFX component authoring user experience is: python function -> ComponentSpec + ExecutorContainerSpec, then ExecutorContainerSpec will be translated to pod manifest directly in K8Slauncher/DockerComponentLauncher, right? Are we going to change this path? I would like to see how the task spec is involved there.

Just a side note. IIUC the proposed TFX component authoring user experience is: python function -> ComponentSpec + ExecutorContainerSpec, then ExecutorContainerSpec will be translated to pod manifest directly in K8Slauncher/DockerComponentLauncher, right? Are we going to change this path? I would like to see how the task spec is involved there.

This was the initial idea. However I later understood that there are some task configuration options that are not part of the component, but need to be delivered to the task-like object and made available to the launcher. For example, we want the Kubernetes options and caching/retry options to reach ContainerOp in KFP and we want KubernetesComponentConfig to reach the KubernetesLauncher/DockerComponentLauncher in TFX.
Example KFP issue: #2942

So the we change the flow from:
python func --> ComponentSpec --> BaseComponent class with ExecutorContainerSpec -- (pass arguments)--> BaseComponent object with ExecutorContainerSpec
to
python func --> ComponentSpec --> factory function --(pass arguments)--> TaskSpec (includes ComponentSpec) + arguments --> BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig

In a sense, we're switching from
ComponentSpec --> task-like object
to
TaskSpec --> task-like object
to preserve and pass all options.

I would like to see how the task spec is involved there.

Let's go backwards from the end goal:

We want the launcher to receive KubernetesComponentConfig or caching_strategy or retry_strategy.

The launcher launches a task-like object (ContainerOp object or BaseComponent object).

Whatever function creates the task-like objects (_default_container_task_constructor) needs to have access to the task configuration (kubernetes, retry, caching, etc).

A class that stores those options is TaskSpec.

So we have two options:
a) Have the _default_container_task_constructor function receive a TaskSpec object.
b) Extend the signature of _default_container_task_constructor to pass all TaskSpec attributes.
Option b) seems to be more future-proof since we won't need to change the signature again if more options are added to TaskSpec.
If we went with option a) we'd have to extend the signature again to add the caching_strategy. Option b) gives us more stable signature.

What are the main issues you see with this approach?

I am opinion-less regarding the implementation of the bespoken component IR here, so I'll not block this PR. Just want to make sure go/component-authoring-tfx-2020 gets refreshed and you got the approval/consensus you need to refactor the corresponding part in TFX, so that this change won't be dangling.

Two things I wish to call out:

Things like k8s options/caching/retry etc. sounds like very platform-specific, which is not closely tie to the underlying business logic defined by either the user-specified Python function or ComponentSpec + ExecutorSpec, so I think it's actually a separate work (perhaps something like 'k8s configuration simplication');

python func --> ComponentSpec --> factory function --(pass arguments)--> TaskSpec (includes ComponentSpec) + arguments --> BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig, followed by a step to translate TFX stuff to ContainerOp in KubeflowDagRunner, is a codepath interleaving TFX and KFP stack, which is perhaps not ideal. I think 1) we might need to refactor the code path for a better clarification, and 2) make sure we have a good test coverage. Especially, in the TFX dsl bridge which you're working on, if we indeed depend on the change introduced in this PR, some head-to-head tests might be needed.

WDYT?

Just want to make sure go/component-authoring-tfx-2020 gets refreshed

Sounds good. Although in practice there should not be visible changes there. Maybe just the mention of BaseComponentConfig part of BaseComponent.

Things like k8s options/caching/retry etc. sounds like very platform-specific, which is not closely tie to the underlying business logic defined by either the user-specified Python function or ComponentSpec + ExecutorSpec,

I agree. Especially for the k8s options. These options must not be part of ComponentSpec which should be completely platform-agnostic. But these options need to exist somewhere in the task-like object. In TFX they exist in BaseComponent as BaseComponentConfig part of BaseComponent. In TaskSpec they are in task_spec.execution_options.

followed by a step to translate TFX stuff to ContainerOp in KubeflowDagRunner, is a codepath interleaving TFX and KFP stack, which is perhaps not ideal.

Note that in the case of the KubeflowDagRunner, the ContainerOp will correspond not to the user component container with possible Kubernetes options, but rather to the KubernetesLauncher. The BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig will be just packed in JSON and sent to the launcher and won't affect the ContainerOp attributes.

make sure we have a good test coverage. Especially, in the TFX dsl bridge which you're working on,

Sounds good.

if we indeed depend on the change introduced in this PR, some head-to-head tests might be needed.

It might be easier due to the fact that the bridge CL is not checked in yet, and that KFP SDK can have frequent releases.

I'm a bit torn here.
I do not want to make any more changes to the _default_container_task_constructor signature in the future (since it's a bit like public internal API). But I also understand that this change brings TaskSpec into the picture and this would be another thing that I'll have to explain to the bridge reviewers.

Another alternative is to make the interface have **kwargs which sweeps the problem under the carpet...

Another alternative is to make the interface have **kwargs which sweeps the problem under the carpet...

Actually **kwargs does not sound very bad to me...

I am wondering if that's acceptable that for short-term, we first enable the users to author python function based component in TFX SDK, with some default options (but still those options can be specified in other ways). Then we solve the configuration simplification part. WDYT?

Actually **kwargs does not sound very bad to me...

This looks like a working solution, yes. I guess I'll just use **kwargs in the TFX' bridge so that it does not expose TaskSpec at first. This makes this PR not urgent.

As for KFP we can later have a chat later about configuring per-task Kubernetes and caching options in graph components.
/hold

sdk/python/kfp/components/_components.py

Ark-kun · 2020-04-07T06:38:18Z

/approve

stale · 2020-07-10T05:49:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2020-07-17T07:39:23Z

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Ark-kun · 2020-07-31T18:48:05Z

/unhold

Ark-kun · 2020-07-31T19:01:13Z

This PR has become a blocker for the work to bridge the gap regarding Kubernetes options, retry options, conditional execution, etc. I think we need to get this in and maybe iterate some more on top of this in the future.

The issue is a mostly a programming issue:

Task = component + arguments + other options (Kubernetes options, retry options, conditional predicate, etc)
Currently the internal functions that create task-like objects (_default_container_task_constructor) only receive component + arguments. They never see the other options.
Thus we need to change the signatures of the _default_container_task_constructor functions

There are two main options for changing the signature:

Add all options to the signature. This is harder to maintain since every [new] TaskSpec option needs to be manually added to all task constructors.
Just pass the TaskSpec object containing all options. This way any new options only need to be added to TaskSpec and the function signatures won't have to change. This improves forwards and backwards compatibility.

This PR implement option 2.

…nd graph handlers This is a refactoring PR. The main goal of the PR is to make _components._container_task_constructor receive TaskSpec.execution_options, is_enabled and any options that might be added in the future. Otherwise, the customizations of tasks in a graph component are lost. This PR partially reverses the previous refactoring which switched away from TaskSpec usage. The interface for _components._container_task_constructor has changed from (component_spec, arguments, component_ref) to (task_spec, arguments). The reason is that task_spec has additional attribues (execution_options) that should be passed in. It looks weird to pass arguments separately (as task_spec can already hold arguments), but the reason for this is that the passed arguments may have types that are incompatible with TaskSpec.arguments. So the arguments are passed separately. The interface is private, so it's fine to make a breaking change here as we control all implementations.

k8s-ci-robot · 2020-09-14T06:35:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [Ark-kun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Ark-kun · 2020-09-15T21:03:17Z

@numerology Can we please get this refactoring in? There are several features relying on this change (making the whole task spec available to ContainerOp bridge), for example, implementation support for Kubernetes options and conditionals which are currently supported in the spec, but cannot be materialized.

numerology

Also, do you mind elaborating what's the difference between ComponentSpec and an argument-less TaskSpec?

I am thinking if if in the end we're only using TaskSpec with arguments populated in the pipeline, it'll be better to distinguish this two types of representations.

sdk/python/kfp/components/_components.py

Ark-kun · 2020-10-02T22:10:06Z

Also, do you mind elaborating what's the difference between ComponentSpec and an argument-less TaskSpec?

TaskSpec = ComponentRef (leads to ComponentSpec) + arguments + other customizations and options.
Other customizations and options include: caching, retries, Kubernetes options and conditional execution predicate.
These options need to reach the container task object construction function. Otherwise there is no way for Kubernetes options or conditionals to work properly. All these options are declared in the TaskSpec class, so it makes sense to pass them all using that class.

Let's think in terms of the container_task_constructor function interface:
I see three options:

(task_spec, arguments) - proposed in this PR
Cons: A bit ugly that task_spec has .arguments and we also pass arguments.
(component_spec, arguments, cache_options, retry_options, when, kubernetes_options, .....)
Cons: The signature is not well defined and is hard to maintain. When we add a new option, old functions will break and will need updating.
Change TaskSpec so that it only has 3 members: component_ref, arguments, everything_else (e.g. task_options). Then the container constructor signature can be (component_spec, arguments, task_options)
Cons: Changing the TaskSpec schema (although an unused part)

stale · 2021-01-01T00:02:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

sm-hawkfish · 2021-02-03T20:40:44Z

This PR is referenced by the (now closed) issue Preserving step config in create_graph_component_from_pipeline_func

I just ran into something similar, where setting "inner pipeline" attributes like train_task.set_display_name("train_awesome_model") and train_task.execution_options.caching_strategy.max_cache_staleness = "P0D" result in tracebacks when trying to use the pipeline as a graph component.

Is this the appropriate place to raise this? Just want to add support for the value of this feature. Thanks !

stale · 2021-06-02T17:29:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

robscc · 2022-01-20T05:11:03Z

hi, any update in this issue? we still waiting for this feature

stale · 2022-04-27T17:59:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rimolive · 2024-03-24T14:26:34Z

Closing this PR. No activity for more than a year.

/close

google-oss-prow · 2024-03-24T14:26:38Z

@rimolive: Closed this PR.

In response to this:

Closing this PR. No activity for more than a year.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Ark-kun added the area/sdk/components label Apr 6, 2020

Ark-kun requested review from numerology and rui5i April 6, 2020 09:26

Ark-kun assigned numerology and rui5i Apr 6, 2020

k8s-ci-robot added the size/M label Apr 6, 2020

k8s-ci-robot requested review from gaoning777 and hongye-sun April 6, 2020 09:27

Ark-kun mentioned this pull request Apr 6, 2020

SDK - Components - Refactoring - _resolve_graph_task now creates tasks directly #3448

Closed

numerology reviewed Apr 6, 2020

View reviewed changes

k8s-ci-robot added the approved label Apr 7, 2020

Ark-kun removed the approved label Apr 8, 2020

k8s-ci-robot added the do-not-merge/hold label Apr 11, 2020

Ark-kun mentioned this pull request May 28, 2020

Logical Intermediate Pipeline Representation #3703

Closed

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jul 10, 2020

stale bot closed this Jul 17, 2020

Ark-kun reopened this Jul 31, 2020

stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jul 31, 2020

k8s-ci-robot added the approved label Jul 31, 2020

k8s-ci-robot removed the do-not-merge/hold label Jul 31, 2020

Ark-kun force-pushed the SDK---Components---Making-whole-TaskSpec-available-to-the-container-and-graph-handlers branch from b22a0b2 to e3fe94c Compare September 14, 2020 06:34

google-cla bot added the cla: yes label Sep 14, 2020

Ark-kun removed the approved label Sep 15, 2020

numerology reviewed Sep 21, 2020

View reviewed changes

sdk/python/kfp/components/_components.py Show resolved Hide resolved

Ark-kun mentioned this pull request Sep 29, 2020

fix(sdk): Allow non-pythonic names for graph components' task's outputs. Fixes #4514. #4515

Merged

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 1, 2021

stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Feb 3, 2021

chensun force-pushed the master branch 2 times, most recently from 7542f0e to 44d22a6 Compare February 12, 2021 09:23

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 2, 2021

stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 20, 2022

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 27, 2022

stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 24, 2024

google-oss-prow bot closed this Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK - Components - Making whole TaskSpec available to the container and graph handlers #3447

SDK - Components - Making whole TaskSpec available to the container and graph handlers #3447

Ark-kun commented Apr 6, 2020

kubeflow-bot commented Apr 6, 2020

numerology Apr 6, 2020

Ark-kun Apr 7, 2020

numerology Apr 7, 2020

Ark-kun Apr 7, 2020

numerology Apr 7, 2020

Ark-kun Apr 8, 2020

numerology Apr 8, 2020

Ark-kun Apr 11, 2020

Ark-kun commented Apr 7, 2020

stale bot commented Jul 10, 2020

stale bot commented Jul 17, 2020

Ark-kun commented Jul 31, 2020

Ark-kun commented Jul 31, 2020

k8s-ci-robot commented Sep 14, 2020

Ark-kun commented Sep 15, 2020

numerology left a comment

Ark-kun commented Oct 2, 2020 •

edited

Loading

stale bot commented Jan 1, 2021

sm-hawkfish commented Feb 3, 2021

stale bot commented Jun 2, 2021

robscc commented Jan 20, 2022

stale bot commented Apr 27, 2022

rimolive commented Mar 24, 2024

google-oss-prow bot commented Mar 24, 2024

SDK - Components - Making whole TaskSpec available to the container and graph handlers #3447

SDK - Components - Making whole TaskSpec available to the container and graph handlers #3447

Conversation

Ark-kun commented Apr 6, 2020

kubeflow-bot commented Apr 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ark-kun commented Apr 7, 2020

stale bot commented Jul 10, 2020

stale bot commented Jul 17, 2020

Ark-kun commented Jul 31, 2020

Ark-kun commented Jul 31, 2020

k8s-ci-robot commented Sep 14, 2020

Ark-kun commented Sep 15, 2020

numerology left a comment

Choose a reason for hiding this comment

Ark-kun commented Oct 2, 2020 • edited Loading

stale bot commented Jan 1, 2021

sm-hawkfish commented Feb 3, 2021

stale bot commented Jun 2, 2021

robscc commented Jan 20, 2022

stale bot commented Apr 27, 2022

rimolive commented Mar 24, 2024

google-oss-prow bot commented Mar 24, 2024

Ark-kun commented Oct 2, 2020 •

edited

Loading