Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[components] google_cloud_pipeline_components JsonArray PipelineParam not resolved/serializable in component #7457

Closed
dctelus opened this issue Mar 23, 2022 · 4 comments
Assignees
Labels
area/sdk kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@dctelus
Copy link

dctelus commented Mar 23, 2022

Environment

  • KFP version:
    Vertex AI on GCP
  • KFP SDK version:
    kfp = "^1.8.11"
  • All dependencies version:
    google-cloud-pipeline-components = "^1.0.1"

Steps to reproduce

from kfp.v2.dsl import component
from google_cloud_pipeline_components.experimental.dataflow import DataflowPythonJobOp

def pip() -> None:
    @component
    def get_input(input: str) -> str:
        return input
    inp = get_input("test")
    DataflowPythonJobOp(
        python_module_path="gs://whatever/script.py",
        temp_location="gs://whatever/tmp",
        project="whatever",
        args=[
            "--input_table", inp.output,
        ]
    )

Gives this error: Object of type PipelineParam is not JSON serializable (but Lists are supposed to be serializable - #1945)

inp.output does have .to_struct(), and if I use it ("--input-table", inp.output.to_struct()) it goes through, although the serialization is passed to the component and not the resolved output value (these are the arguments to the component:

--project; whatever; --location; us-central1; --python_module_path; gs://whatever/script.py; --temp_location; gs://whatever/tmp; --requirements_file_path; ; --args; ["--input_table", "{{pipelineparam:op=get-input;name=Output}}"]; 

)

JsonArray seems like an alias to List in the sdk code, which is why I think it might be a sdk issue, but it also might be the specific component that is not setup correctly, or me not understanding the serialization correctly. What do you think?

Expected result

args array is serialized and input_table has value "test" inside component

Materials and Reference


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@dctelus dctelus changed the title [sdk] google_cloud_pipeline_components JsonArray PipelineParam not resolved in component [sdk] google_cloud_pipeline_components JsonArray PipelineParam not resolved/serializable in component Mar 23, 2022
@dctelus
Copy link
Author

dctelus commented Mar 24, 2022

After further investigation, it seems kfp.v2 does not support PipelineParam json serialization (as implemented here for v1 #2212). Any plans on supporting this in v2?

@dctelus dctelus changed the title [sdk] google_cloud_pipeline_components JsonArray PipelineParam not resolved/serializable in component [sdk] kfp.v2 does not support PipelineParam JSON serialization in component Mar 24, 2022
@dctelus
Copy link
Author

dctelus commented Mar 24, 2022

scratch that, I have no idea what is wrong here

here is why I was thinking so:
using compiler v2 doesn't work (Object of type PipelineParam is not JSON serializable), but v1 does

from kfp.v2 import compiler
from kfp import compiler as compiler_v1
from kfp import components
from kfp.dsl import PipelineParam

def consume_list_p(list_param: list) -> int:
    pass

consume_list = components.create_component_from_func(consume_list_p)

def pip():
    task = consume_list([1, 2, 3, PipelineParam("aaa"), 4, 5, 6])

# doesn't work
# compiler.Compiler().compile(
#    pipeline_func=pip,
#    package_path="test.json",
# )

compiler_v1.Compiler().compile(
    pipeline_func=pip,
    package_path="test.yaml",
)

@dctelus dctelus changed the title [sdk] kfp.v2 does not support PipelineParam JSON serialization in component [sdk] google_cloud_pipeline_components JsonArray PipelineParam not resolved/serializable in component Mar 24, 2022
@Linchin Linchin changed the title [sdk] google_cloud_pipeline_components JsonArray PipelineParam not resolved/serializable in component [components] google_cloud_pipeline_components JsonArray PipelineParam not resolved/serializable in component Mar 24, 2022
RobbeSneyders pushed a commit to ml6team/fondant that referenced this issue May 10, 2023
This PR enables the user to pass different `bool`, `lists` and `dict` to
a component. Kubeflow typically handles those arguments by serializing
them as a string . For this reason, they need to be de-serialized again
within the component in order for them to be properly handled.

This might go away once we move to V2. 

References to the issue: 
kubeflow/pipelines#7457
kubeflow/pipelines#7719
Copy link

github-actions bot commented May 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 5, 2024
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sdk kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
None yet
Development

No branches or pull requests

2 participants