-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK - Components - Making whole TaskSpec available to the container and graph handlers #3447
Conversation
|
||
This function provides a reference implementation for the _default_container_task_constructor that returns TaskSpec objects. | ||
The only reference-type arguments that TaskSpec can hold are TaskOutputArgument and GraphInputArgument. | ||
When bridging with other frameworks, an alternative implementation should be provided that can process reference-type arguments that are native to that framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind elaborating on what other frameworks might be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By other framework I mean any orchestration framework whose DSL is not build on top of TaskSpec
and TaskOutputReference
.
The main examples are:
- KFP: Uses
ContainerOp
andPipelineParam
. Implementation of _container_task_constructor for KFP: link - TFX: Uses
BaseComponent
andChannel
. Implementation of _container_task_constructor for TFX: link - Airflow: Uses
Operator
. No implementations.
The reference implementation is used for framework-agnostic testing and also for creation of graph components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Just a side note. IIUC the proposed TFX component authoring user experience is: python function -> ComponentSpec + ExecutorContainerSpec, then ExecutorContainerSpec will be translated to pod manifest directly in K8Slauncher/DockerComponentLauncher, right? Are we going to change this path? I would like to see how the task spec is involved there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a side note. IIUC the proposed TFX component authoring user experience is: python function -> ComponentSpec + ExecutorContainerSpec, then ExecutorContainerSpec will be translated to pod manifest directly in K8Slauncher/DockerComponentLauncher, right? Are we going to change this path? I would like to see how the task spec is involved there.
This was the initial idea. However I later understood that there are some task configuration options that are not part of the component, but need to be delivered to the task-like object and made available to the launcher. For example, we want the Kubernetes options and caching/retry options to reach ContainerOp in KFP and we want KubernetesComponentConfig
to reach the KubernetesLauncher
/DockerComponentLauncher
in TFX.
Example KFP issue: #2942
So the we change the flow from:
python func --> ComponentSpec --> BaseComponent class with ExecutorContainerSpec -- (pass arguments)--> BaseComponent object with ExecutorContainerSpec
to
python func --> ComponentSpec --> factory function --(pass arguments)--> TaskSpec (includes ComponentSpec) + arguments --> BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig
In a sense, we're switching from
ComponentSpec --> task-like object
to
TaskSpec --> task-like object
to preserve and pass all options.
I would like to see how the task spec is involved there.
Let's go backwards from the end goal:
- We want the launcher to receive KubernetesComponentConfig or caching_strategy or retry_strategy.
- The launcher launches a task-like object (ContainerOp object or BaseComponent object).
- Whatever function creates the task-like objects (
_default_container_task_constructor
) needs to have access to the task configuration (kubernetes, retry, caching, etc). - A class that stores those options is
TaskSpec
. - So we have two options:
a) Have the_default_container_task_constructor
function receive aTaskSpec
object.
b) Extend the signature of_default_container_task_constructor
to pass allTaskSpec
attributes.
Option b) seems to be more future-proof since we won't need to change the signature again if more options are added to TaskSpec.
If we went with option a) we'd have to extend the signature again to add the caching_strategy. Option b) gives us more stable signature.
What are the main issues you see with this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am opinion-less regarding the implementation of the bespoken component IR here, so I'll not block this PR. Just want to make sure go/component-authoring-tfx-2020 gets refreshed and you got the approval/consensus you need to refactor the corresponding part in TFX, so that this change won't be dangling.
Two things I wish to call out:
-
Things like k8s options/caching/retry etc. sounds like very platform-specific, which is not closely tie to the underlying business logic defined by either the user-specified Python function or ComponentSpec + ExecutorSpec, so I think it's actually a separate work (perhaps something like 'k8s configuration simplication');
-
python func --> ComponentSpec --> factory function --(pass arguments)--> TaskSpec (includes ComponentSpec) + arguments --> BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig, followed by a step to translate TFX stuff to ContainerOp in KubeflowDagRunner, is a codepath interleaving TFX and KFP stack, which is perhaps not ideal. I think 1) we might need to refactor the code path for a better clarification, and 2) make sure we have a good test coverage. Especially, in the TFX dsl bridge which you're working on, if we indeed depend on the change introduced in this PR, some head-to-head tests might be needed.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to make sure go/component-authoring-tfx-2020 gets refreshed
Sounds good. Although in practice there should not be visible changes there. Maybe just the mention of BaseComponentConfig part of BaseComponent
.
Things like k8s options/caching/retry etc. sounds like very platform-specific, which is not closely tie to the underlying business logic defined by either the user-specified Python function or ComponentSpec + ExecutorSpec,
I agree. Especially for the k8s options. These options must not be part of ComponentSpec which should be completely platform-agnostic. But these options need to exist somewhere in the task-like object. In TFX they exist in BaseComponent as BaseComponentConfig part of BaseComponent
. In TaskSpec
they are in task_spec.execution_options
.
followed by a step to translate TFX stuff to ContainerOp in KubeflowDagRunner, is a codepath interleaving TFX and KFP stack, which is perhaps not ideal.
Note that in the case of the KubeflowDagRunner, the ContainerOp will correspond not to the user component container with possible Kubernetes options, but rather to the KubernetesLauncher. The BaseComponent object with ExecutorContainerSpec and KubernetesComponentConfig will be just packed in JSON and sent to the launcher and won't affect the ContainerOp attributes.
- make sure we have a good test coverage. Especially, in the TFX dsl bridge which you're working on,
Sounds good.
if we indeed depend on the change introduced in this PR, some head-to-head tests might be needed.
It might be easier due to the fact that the bridge CL is not checked in yet, and that KFP SDK can have frequent releases.
I'm a bit torn here.
I do not want to make any more changes to the _default_container_task_constructor signature in the future (since it's a bit like public internal API). But I also understand that this change brings TaskSpec into the picture and this would be another thing that I'll have to explain to the bridge reviewers.
Another alternative is to make the interface have **kwargs
which sweeps the problem under the carpet...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative is to make the interface have **kwargs which sweeps the problem under the carpet...
Actually **kwargs
does not sound very bad to me...
I am wondering if that's acceptable that for short-term, we first enable the users to author python function based component in TFX SDK, with some default options (but still those options can be specified in other ways). Then we solve the configuration simplification part. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually
**kwargs
does not sound very bad to me...
This looks like a working solution, yes. I guess I'll just use **kwargs in the TFX' bridge so that it does not expose TaskSpec at first. This makes this PR not urgent.
As for KFP we can later have a chat later about configuring per-task Kubernetes and caching options in graph components.
/hold
/approve |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
/unhold |
This PR has become a blocker for the work to bridge the gap regarding Kubernetes options, retry options, conditional execution, etc. I think we need to get this in and maybe iterate some more on top of this in the future. The issue is a mostly a programming issue:
There are two main options for changing the signature:
This PR implement option 2. |
…nd graph handlers This is a refactoring PR. The main goal of the PR is to make _components._container_task_constructor receive TaskSpec.execution_options, is_enabled and any options that might be added in the future. Otherwise, the customizations of tasks in a graph component are lost. This PR partially reverses the previous refactoring which switched away from TaskSpec usage. The interface for _components._container_task_constructor has changed from (component_spec, arguments, component_ref) to (task_spec, arguments). The reason is that task_spec has additional attribues (execution_options) that should be passed in. It looks weird to pass arguments separately (as task_spec can already hold arguments), but the reason for this is that the passed arguments may have types that are incompatible with TaskSpec.arguments. So the arguments are passed separately. The interface is private, so it's fine to make a breaking change here as we control all implementations.
b22a0b2
to
e3fe94c
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@numerology Can we please get this refactoring in? There are several features relying on this change (making the whole task spec available to ContainerOp bridge), for example, implementation support for Kubernetes options and conditionals which are currently supported in the spec, but cannot be materialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you mind elaborating what's the difference between ComponentSpec
and an argument-less TaskSpec
?
I am thinking if if in the end we're only using TaskSpec
with arguments populated in the pipeline, it'll be better to distinguish this two types of representations.
TaskSpec = ComponentRef (leads to ComponentSpec) + arguments + other customizations and options. Let's think in terms of the container_task_constructor function interface:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This PR is referenced by the (now closed) issue Preserving step config in create_graph_component_from_pipeline_func I just ran into something similar, where setting "inner pipeline" attributes like Is this the appropriate place to raise this? Just want to add support for the value of this feature. Thanks ! |
7542f0e
to
44d22a6
Compare
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
hi, any update in this issue? we still waiting for this feature |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Closing this PR. No activity for more than a year. /close |
@rimolive: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is a refactoring PR.
The main goal of the PR is to make
_components._container_task_constructor
receive TaskSpec.execution_options, is_enabled and any options that might be added in the future. Otherwise, the customizations of tasks in a graph component are lost.This PR partially reverses the previous refactoring which switched away from TaskSpec usage: (task_spec) to (component_spec, arguments, component_ref).
The interface for
_components._container_task_constructor
now changes from (component_spec, arguments, component_ref) to (task_spec, arguments).The reason is that task_spec has additional attribues (execution_options) that should be passed in.
It looks weird to pass arguments separately (as task_spec can already hold arguments), but the reason for this is that the passed arguments may have types that are incompatible with
TaskSpec.arguments
. So the arguments are passed separately.The interface is private, so it's fine to make a breaking change here as we control all implementations.