SDK/Components - Added /data to the generated file paths #663

Ark-kun · 2019-01-10T08:27:26Z

This is needed for the future storage system based on volume mounts:
If outputs were written to files in the same dir (e.g. /outputs/out1.txt and /outputs/out2.txt), then we cannot separate them and mount to the downstream task containers independently.
Writing outputs to /outputs/out1/data and /outputs/out2/data solves this problem.

The PR is just 3 lines, but here is a more detailed explanation on why this improves the storage story:

In future we want to use mounted volumes to pass the artifacts between components.

Each input needs to be placed into its own directory.
Multiple inputs are stored in different locations on the storage volume.
Only one volume can be mounted to a container's local directory.
Thus, to mount multiple inputs from different locations we a directory for each input to mount the corresponding volume.
Each output needs to be placed into its own directory.
Output volume can only be mounted to a directory.
If all outputs are placed in the same local directory, then it would be impossible to mount just one of them to the downstream container. Also, having files from multiple outputs in a same directory requires them to have different names which breaks the next item.
Thus every output should be placed into its own directory.
Output files should have same names in storage (different directories, but same name e.g. "data").
When we pass outputs of some upstream task to the input of downstream task, the downstream program needs to know the full input file path. The input volume mount locations can generated from the downstream component definition alone (e.g. /inputs/Downstream_input_1/). But to generate the full path, the system also needs to know the file name in the mounted volume. If that name is always the same, then the problem is solved. Otherwise, the system needs information that's taken from the upstream task. This complicates the system and can be very hard to do in cases of caching or recursion.
So, when two tasks are connected, the name of the file that's being passed should be predictable by downstream task without knowing anything about the upstream task.
Thus output file names must be same everywhere.

This change is

This is needed for the future storage system based on volume mounts: If outputs were written to files in the same dir (e.g. /outputs/out1.txt and /outputs/out2.txt), then we cannot separate them and mount to the downstream task containers independently.

gaoning777

/lgtm

k8s-ci-robot · 2019-01-16T01:14:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun, gaoning777

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [Ark-kun,gaoning777]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2019-01-16T01:14:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun, gaoning777

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [Ark-kun,gaoning777]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Remove references to blueprints; this should not have been committed. We use a separate script to cleanup blueprints.

* delete isvc-tensorflow after testing * add predict

Ark-kun added the area/sdk/components label Jan 10, 2019

Ark-kun assigned gaoning777 and qimingj Jan 10, 2019

Ark-kun requested review from gaoning777 and qimingj January 10, 2019 08:27

k8s-ci-robot requested a review from hongye-sun January 10, 2019 08:27

k8s-ci-robot added size/XS labels Jan 10, 2019

Ark-kun force-pushed the SDK/Components---Added-/data-to-the-generated-file-paths branch from 6c0ad33 to 8b07d6a Compare January 15, 2019 23:21

k8s-ci-robot added the approved label Jan 15, 2019

Ark-kun force-pushed the SDK/Components---Added-/data-to-the-generated-file-paths branch from 8b07d6a to 433bb82 Compare January 15, 2019 23:22

k8s-ci-robot added the lgtm label Jan 16, 2019

gaoning777 approved these changes Jan 16, 2019

View reviewed changes

k8s-ci-robot merged commit b12d5d8 into kubeflow:master Jan 16, 2019

Ark-kun deleted the SDK/Components---Added-/data-to-the-generated-file-paths branch January 18, 2019 01:58

Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023

Fix cleanup_ci.py; remove references to blueprints. (kubeflow#663)

cdb2b39

* Remove references to blueprints; this should not have been committed. We use a separate script to cleanup blueprints.

magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023

Delete isvc-tensorflow inferenceservices after testing (kubeflow#663)

8b393a7

* delete isvc-tensorflow after testing * add predict

HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this pull request Mar 11, 2024

fix(pipelineloop): restored pipelineloop conditions test. (kubeflow#663)

0ad68be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK/Components - Added /data to the generated file paths #663

SDK/Components - Added /data to the generated file paths #663

Ark-kun commented Jan 10, 2019 •

edited

Loading

gaoning777 left a comment

k8s-ci-robot commented Jan 16, 2019

k8s-ci-robot commented Jan 16, 2019

SDK/Components - Added /data to the generated file paths #663

SDK/Components - Added /data to the generated file paths #663

Conversation

Ark-kun commented Jan 10, 2019 • edited Loading

gaoning777 left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 16, 2019

k8s-ci-robot commented Jan 16, 2019

Ark-kun commented Jan 10, 2019 •

edited

Loading