-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK/Components - Added /data to the generated file paths #663
SDK/Components - Added /data to the generated file paths #663
Conversation
6c0ad33
to
8b07d6a
Compare
This is needed for the future storage system based on volume mounts: If outputs were written to files in the same dir (e.g. /outputs/out1.txt and /outputs/out2.txt), then we cannot separate them and mount to the downstream task containers independently.
8b07d6a
to
433bb82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun, gaoning777 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun, gaoning777 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Remove references to blueprints; this should not have been committed. We use a separate script to cleanup blueprints.
* delete isvc-tensorflow after testing * add predict
This is needed for the future storage system based on volume mounts:
If outputs were written to files in the same dir (e.g.
/outputs/out1.txt
and/outputs/out2.txt
), then we cannot separate them and mount to the downstream task containers independently.Writing outputs to
/outputs/out1/data
and/outputs/out2/data
solves this problem.The PR is just 3 lines, but here is a more detailed explanation on why this improves the storage story:
In future we want to use mounted volumes to pass the artifacts between components.
Each input needs to be placed into its own directory.
Multiple inputs are stored in different locations on the storage volume.
Only one volume can be mounted to a container's local directory.
Thus, to mount multiple inputs from different locations we a directory for each input to mount the corresponding volume.
Each output needs to be placed into its own directory.
Output volume can only be mounted to a directory.
If all outputs are placed in the same local directory, then it would be impossible to mount just one of them to the downstream container. Also, having files from multiple outputs in a same directory requires them to have different names which breaks the next item.
Thus every output should be placed into its own directory.
Output files should have same names in storage (different directories, but same name e.g. "data").
When we pass outputs of some upstream task to the input of downstream task, the downstream program needs to know the full input file path. The input volume mount locations can generated from the downstream component definition alone (e.g.
/inputs/Downstream_input_1/
). But to generate the full path, the system also needs to know the file name in the mounted volume. If that name is always the same, then the problem is solved. Otherwise, the system needs information that's taken from the upstream task. This complicates the system and can be very hard to do in cases of caching or recursion.So, when two tasks are connected, the name of the file that's being passed should be predictable by downstream task without knowing anything about the upstream task.
Thus output file names must be same everywhere.
This change is