Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK - Components - Making whole TaskSpec available to the container and graph handlers #3447

Conversation

Ark-kun
Copy link
Contributor

@Ark-kun Ark-kun commented Apr 6, 2020

This is a refactoring PR.
The main goal of the PR is to make _components._container_task_constructor receive TaskSpec.execution_options, is_enabled and any options that might be added in the future. Otherwise, the customizations of tasks in a graph component are lost.

This PR partially reverses the previous refactoring which switched away from TaskSpec usage: (task_spec) to (component_spec, arguments, component_ref).
The interface for _components._container_task_constructor now changes from (component_spec, arguments, component_ref) to (task_spec, arguments).

The reason is that task_spec has additional attribues (execution_options) that should be passed in.
It looks weird to pass arguments separately (as task_spec can already hold arguments), but the reason for this is that the passed arguments may have types that are incompatible with TaskSpec.arguments. So the arguments are passed separately.
The interface is private, so it's fine to make a breaking change here as we control all implementations.

@kubeflow-bot
Copy link

This change is Reviewable


This function provides a reference implementation for the _default_container_task_constructor that returns TaskSpec objects.
The only reference-type arguments that TaskSpec can hold are TaskOutputArgument and GraphInputArgument.
When bridging with other frameworks, an alternative implementation should be provided that can process reference-type arguments that are native to that framework.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind elaborating on what other frameworks might be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By other framework I mean any orchestration framework whose DSL is not build on top of TaskSpec and TaskOutputReference.
The main examples are:

  • KFP: Uses ContainerOp and PipelineParam. Implementation of _container_task_constructor for KFP: link
  • TFX: Uses BaseComponent and Channel. Implementation of _container_task_constructor for TFX: link
  • Airflow: Uses Operator. No implementations.

The reference implementation is used for framework-agnostic testing and also for creation of graph components.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Just a side note. IIUC the proposed TFX component authoring user experience is: python function -> ComponentSpec + ExecutorContainerSpec, then ExecutorContainerSpec will be translated to pod manifest directly in K8Slauncher/DockerComponentLauncher, right? Are we going to change this path? I would like to see how the task spec is involved there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a side note. IIUC the proposed TFX component authoring user experience is: python function -> ComponentSpec + ExecutorContainerSpec, then ExecutorContainerSpec will be translated to pod manifest directly in K8Slauncher/DockerComponentLauncher, right? Are we going to change this path? I would like to see how the task spec is involved there.

This was the initial idea. However I later understood that there are some task configuration options that are not part of the component, but need to be delivered to the task-like object and made available to the launcher. For example, we want the Kubernetes options and caching/retry options to reach ContainerOp in KFP and we want KubernetesComponentConfig to reach the KubernetesLauncher/DockerComponentLauncher in TFX.
Example KFP issue: #2942

So the we change the flow from:
python func --> ComponentSpec --> BaseComponent class with ExecutorContainerSpec -- (pass arguments)--> BaseComponent object with ExecutorContainerSpec
to
python func --> ComponentSpec --> factory function --(pass arguments)--> TaskSpec (includes ComponentSpec) + arguments --> BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig

In a sense, we're switching from
ComponentSpec --> task-like object
to
TaskSpec --> task-like object
to preserve and pass all options.

I would like to see how the task spec is involved there.

Let's go backwards from the end goal:

  1. We want the launcher to receive KubernetesComponentConfig or caching_strategy or retry_strategy.
  2. The launcher launches a task-like object (ContainerOp object or BaseComponent object).
  3. Whatever function creates the task-like objects (_default_container_task_constructor) needs to have access to the task configuration (kubernetes, retry, caching, etc).
  4. A class that stores those options is TaskSpec.
  5. So we have two options:
    a) Have the _default_container_task_constructor function receive a TaskSpec object.
    b) Extend the signature of _default_container_task_constructor to pass all TaskSpec attributes.
    Option b) seems to be more future-proof since we won't need to change the signature again if more options are added to TaskSpec.
    If we went with option a) we'd have to extend the signature again to add the caching_strategy. Option b) gives us more stable signature.

What are the main issues you see with this approach?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am opinion-less regarding the implementation of the bespoken component IR here, so I'll not block this PR. Just want to make sure go/component-authoring-tfx-2020 gets refreshed and you got the approval/consensus you need to refactor the corresponding part in TFX, so that this change won't be dangling.

Two things I wish to call out:

  1. Things like k8s options/caching/retry etc. sounds like very platform-specific, which is not closely tie to the underlying business logic defined by either the user-specified Python function or ComponentSpec + ExecutorSpec, so I think it's actually a separate work (perhaps something like 'k8s configuration simplication');

  2. python func --> ComponentSpec --> factory function --(pass arguments)--> TaskSpec (includes ComponentSpec) + arguments --> BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig, followed by a step to translate TFX stuff to ContainerOp in KubeflowDagRunner, is a codepath interleaving TFX and KFP stack, which is perhaps not ideal. I think 1) we might need to refactor the code path for a better clarification, and 2) make sure we have a good test coverage. Especially, in the TFX dsl bridge which you're working on, if we indeed depend on the change introduced in this PR, some head-to-head tests might be needed.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure go/component-authoring-tfx-2020 gets refreshed

Sounds good. Although in practice there should not be visible changes there. Maybe just the mention of BaseComponentConfig part of BaseComponent.

Things like k8s options/caching/retry etc. sounds like very platform-specific, which is not closely tie to the underlying business logic defined by either the user-specified Python function or ComponentSpec + ExecutorSpec,

I agree. Especially for the k8s options. These options must not be part of ComponentSpec which should be completely platform-agnostic. But these options need to exist somewhere in the task-like object. In TFX they exist in BaseComponent as BaseComponentConfig part of BaseComponent. In TaskSpec they are in task_spec.execution_options.

followed by a step to translate TFX stuff to ContainerOp in KubeflowDagRunner, is a codepath interleaving TFX and KFP stack, which is perhaps not ideal.

Note that in the case of the KubeflowDagRunner, the ContainerOp will correspond not to the user component container with possible Kubernetes options, but rather to the KubernetesLauncher. The BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig will be just packed in JSON and sent to the launcher and won't affect the ContainerOp attributes.

  1. make sure we have a good test coverage. Especially, in the TFX dsl bridge which you're working on,

Sounds good.

if we indeed depend on the change introduced in this PR, some head-to-head tests might be needed.

It might be easier due to the fact that the bridge CL is not checked in yet, and that KFP SDK can have frequent releases.

I'm a bit torn here.
I do not want to make any more changes to the _default_container_task_constructor signature in the future (since it's a bit like public internal API). But I also understand that this change brings TaskSpec into the picture and this would be another thing that I'll have to explain to the bridge reviewers.

Another alternative is to make the interface have **kwargs which sweeps the problem under the carpet...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to make the interface have **kwargs which sweeps the problem under the carpet...

Actually **kwargs does not sound very bad to me...

I am wondering if that's acceptable that for short-term, we first enable the users to author python function based component in TFX SDK, with some default options (but still those options can be specified in other ways). Then we solve the configuration simplification part. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually **kwargs does not sound very bad to me...

This looks like a working solution, yes. I guess I'll just use **kwargs in the TFX' bridge so that it does not expose TaskSpec at first. This makes this PR not urgent.

As for KFP we can later have a chat later about configuring per-task Kubernetes and caching options in graph components.
/hold

sdk/python/kfp/components/_components.py Show resolved Hide resolved
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Apr 7, 2020

/approve

@stale
Copy link

stale bot commented Jul 10, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jul 10, 2020
@stale
Copy link

stale bot commented Jul 17, 2020

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this Jul 17, 2020
@Ark-kun Ark-kun reopened this Jul 31, 2020
@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jul 31, 2020
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jul 31, 2020

/unhold

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jul 31, 2020

This PR has become a blocker for the work to bridge the gap regarding Kubernetes options, retry options, conditional execution, etc. I think we need to get this in and maybe iterate some more on top of this in the future.

The issue is a mostly a programming issue:

  1. Task = component + arguments + other options (Kubernetes options, retry options, conditional predicate, etc)
  2. Currently the internal functions that create task-like objects (_default_container_task_constructor) only receive component + arguments. They never see the other options.
  3. Thus we need to change the signatures of the _default_container_task_constructor functions

There are two main options for changing the signature:

  1. Add all options to the signature. This is harder to maintain since every [new] TaskSpec option needs to be manually added to all task constructors.
  2. Just pass the TaskSpec object containing all options. This way any new options only need to be added to TaskSpec and the function signatures won't have to change. This improves forwards and backwards compatibility.

This PR implement option 2.

…nd graph handlers

This is a refactoring PR.
The main goal of the PR is to make _components._container_task_constructor receive TaskSpec.execution_options, is_enabled and any options that might be added in the future. Otherwise, the customizations of tasks in a graph component are lost.

This PR partially reverses the previous refactoring which switched away from TaskSpec usage.

The interface for _components._container_task_constructor has changed from (component_spec, arguments, component_ref) to (task_spec, arguments).
The reason is that task_spec has additional attribues (execution_options) that should be passed in.
It looks weird to pass arguments separately (as task_spec can already hold arguments), but the reason for this is that the passed arguments may have types that are incompatible with TaskSpec.arguments. So the arguments are passed separately.
The interface is private, so it's fine to make a breaking change here as we control all implementations.
@Ark-kun Ark-kun force-pushed the SDK---Components---Making-whole-TaskSpec-available-to-the-container-and-graph-handlers branch from b22a0b2 to e3fe94c Compare September 14, 2020 06:34
@google-cla google-cla bot added the cla: yes label Sep 14, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Ark-kun Ark-kun removed the approved label Sep 15, 2020
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Sep 15, 2020

@numerology Can we please get this refactoring in? There are several features relying on this change (making the whole task spec available to ContainerOp bridge), for example, implementation support for Kubernetes options and conditionals which are currently supported in the spec, but cannot be materialized.

Copy link

@numerology numerology left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you mind elaborating what's the difference between ComponentSpec and an argument-less TaskSpec?

I am thinking if if in the end we're only using TaskSpec with arguments populated in the pipeline, it'll be better to distinguish this two types of representations.

sdk/python/kfp/components/_components.py Show resolved Hide resolved
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Oct 2, 2020

Also, do you mind elaborating what's the difference between ComponentSpec and an argument-less TaskSpec?

TaskSpec = ComponentRef (leads to ComponentSpec) + arguments + other customizations and options.
Other customizations and options include: caching, retries, Kubernetes options and conditional execution predicate.
These options need to reach the container task object construction function. Otherwise there is no way for Kubernetes options or conditionals to work properly. All these options are declared in the TaskSpec class, so it makes sense to pass them all using that class.

Let's think in terms of the container_task_constructor function interface:
I see three options:

  1. (task_spec, arguments) - proposed in this PR
    Cons: A bit ugly that task_spec has .arguments and we also pass arguments.

  2. (component_spec, arguments, cache_options, retry_options, when, kubernetes_options, .....)
    Cons: The signature is not well defined and is hard to maintain. When we add a new option, old functions will break and will need updating.

  3. Change TaskSpec so that it only has 3 members: component_ref, arguments, everything_else (e.g. task_options). Then the container constructor signature can be (component_spec, arguments, task_options)
    Cons: Changing the TaskSpec schema (although an unused part)

@stale
Copy link

stale bot commented Jan 1, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 1, 2021
@sm-hawkfish
Copy link

This PR is referenced by the (now closed) issue Preserving step config in create_graph_component_from_pipeline_func

I just ran into something similar, where setting "inner pipeline" attributes like train_task.set_display_name("train_awesome_model") and train_task.execution_options.caching_strategy.max_cache_staleness = "P0D" result in tracebacks when trying to use the pipeline as a graph component.

Is this the appropriate place to raise this? Just want to add support for the value of this feature. Thanks !

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Feb 3, 2021
@chensun chensun force-pushed the master branch 2 times, most recently from 7542f0e to 44d22a6 Compare February 12, 2021 09:23
@stale
Copy link

stale bot commented Jun 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 2, 2021
@robscc
Copy link

robscc commented Jan 20, 2022

hi, any update in this issue? we still waiting for this feature

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 20, 2022
@stale
Copy link

stale bot commented Apr 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 27, 2022
@rimolive
Copy link
Member

Closing this PR. No activity for more than a year.

/close

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 24, 2024
@google-oss-prow google-oss-prow bot closed this Mar 24, 2024
Copy link

@rimolive: Closed this PR.

In response to this:

Closing this PR. No activity for more than a year.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants