Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK/Components - Added /data to the generated file paths #663

Conversation

Ark-kun
Copy link
Contributor

@Ark-kun Ark-kun commented Jan 10, 2019

This is needed for the future storage system based on volume mounts:
If outputs were written to files in the same dir (e.g. /outputs/out1.txt and /outputs/out2.txt), then we cannot separate them and mount to the downstream task containers independently.
Writing outputs to /outputs/out1/data and /outputs/out2/data solves this problem.

The PR is just 3 lines, but here is a more detailed explanation on why this improves the storage story:

In future we want to use mounted volumes to pass the artifacts between components.

  1. Each input needs to be placed into its own directory.
    Multiple inputs are stored in different locations on the storage volume.
    Only one volume can be mounted to a container's local directory.
    Thus, to mount multiple inputs from different locations we a directory for each input to mount the corresponding volume.

  2. Each output needs to be placed into its own directory.
    Output volume can only be mounted to a directory.
    If all outputs are placed in the same local directory, then it would be impossible to mount just one of them to the downstream container. Also, having files from multiple outputs in a same directory requires them to have different names which breaks the next item.
    Thus every output should be placed into its own directory.

  3. Output files should have same names in storage (different directories, but same name e.g. "data").
    When we pass outputs of some upstream task to the input of downstream task, the downstream program needs to know the full input file path. The input volume mount locations can generated from the downstream component definition alone (e.g. /inputs/Downstream_input_1/). But to generate the full path, the system also needs to know the file name in the mounted volume. If that name is always the same, then the problem is solved. Otherwise, the system needs information that's taken from the upstream task. This complicates the system and can be very hard to do in cases of caching or recursion.
    So, when two tasks are connected, the name of the file that's being passed should be predictable by downstream task without knowing anything about the upstream task.
    Thus output file names must be same everywhere.


This change is Reviewable

@Ark-kun Ark-kun requested review from gaoning777 and qimingj January 10, 2019 08:27
@Ark-kun Ark-kun force-pushed the SDK/Components---Added-/data-to-the-generated-file-paths branch from 6c0ad33 to 8b07d6a Compare January 15, 2019 23:21
This is needed for the future storage system based on volume mounts:
If outputs were written to files in the same dir (e.g. /outputs/out1.txt and /outputs/out2.txt), then we cannot separate them and mount to the downstream task containers independently.
@Ark-kun Ark-kun force-pushed the SDK/Components---Added-/data-to-the-generated-file-paths branch from 8b07d6a to 433bb82 Compare January 15, 2019 23:22
Copy link
Contributor

@gaoning777 gaoning777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun, gaoning777

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun, gaoning777

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit b12d5d8 into kubeflow:master Jan 16, 2019
@Ark-kun Ark-kun deleted the SDK/Components---Added-/data-to-the-generated-file-paths branch January 18, 2019 01:58
Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023
* Remove references to blueprints; this should not have been committed.
  We use a separate script to cleanup blueprints.
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
* delete isvc-tensorflow after testing

* add predict
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this pull request Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants