Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeflow_tfjob_launcher_op to support volumes #1344

Closed
ryandawsonuk opened this issue May 16, 2019 · 9 comments
Closed

kubeflow_tfjob_launcher_op to support volumes #1344

ryandawsonuk opened this issue May 16, 2019 · 9 comments
Assignees
Labels
area/samples kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label. priority/p2

Comments

@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented May 16, 2019

I want to be able to load a TFJob with a mounted directory so that the output can be written and used in the next step. Much of the job template is configurable as the template is loaded as a dict and modified using the operation parameters. But this doesn't work so neatly for volumes as they also need volume mounts and rather than just replacement we need the option for whole sections to either be there or not.

To accommodate volumes it might work better to inline the whole file definition as a multi-line string. Then sections could be added into the composition of the string. If there are no volumes passed in (ideally using pvolumes) then the section for volumes would be an empty string. This is just a suggestion - perhaps there are better templating approaches available.

@elikatsis
Copy link
Member

/assign @elikatsis

Let's pick it from where we left in #1345.

  1. ResourceOp to support pvolumes #1345 (comment)
  2. ResourceOp to support pvolumes #1345 (comment)

The goal is to determine what should the API cover for a TFJobOp.

In my opinion, the constructor's arguments shouldn't cover the whole TFJob spec. Since TFJobOp is a subclass of ResourceOp, it is always possible to specify a TFJob via a dictionary or a class (using some sort of client if one exists) and pass it as k8s_resource.

The arguments should make the common case fast, exposing those specs which have to be specified most of the times, following the VolumeOp class as an example.

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented May 17, 2019

I was thinking the same parameters as the existing TFJob launcher that works via ContianerOp. There is a comment referencing args but would make sense to use the args in kubeflow_tfjob_launcher_op.py as those seem to be the ones that are used. Extra arguments would be needed for volumes. Effectively we need to add the pvc-claim name and volume mount path (and maybe the volume name? perhaps that could be set to something arbitrary?). So could maybe be like other uses of pvolumes.

I wonder if the best way to handle the conditionals options in generating sections from a template would be some basic jinja2 rather than plain python code with if statements. There are some examples out there and it would provide the flexibility to add environment variables only if needed and mount volumes only if the user wants them and could be expanded in the future. It might be tempting to think of kubectl for templating but in extending ResourceOp we presumably don't have access to that as it would be a tool that would run in a container.

Presumably extending ResourceOp and handling the TFJob as an argo resource step would allow for dependency on the created resource to be captured by hooking into argo's native successCondition to wait for the TFJob to complete - as seen in the argo project's own ML example.

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented May 23, 2019

I have been thinking about whether kustomize could be used to provide the variations on a TFJob instead of jina2. But it's not easy to see how to do that as kustomize works on different principles. Python built-in string functions is another way that could be sufficient for this purpose.

@hougangliu
Copy link
Member

/assign

@hougangliu
Copy link
Member

/unassign

@ryandawsonuk
Copy link
Contributor Author

I notice support for volumes is being added to tfjob_launcher_op in #1494

Looks like it expects volumes to be specified in a dict of k-v pairs for pvc-name (key) and mountPath (value). Is that right @hougangliu ?

Am wondering how it could be used with pvolumes. I guess you could get the name of the pvolume and construct a dict.

Am not saying it has to work with pvolumes directly. Just wondering how you would do it.

@elikatsis
Copy link
Member

@ryandawsonuk
Hey Ryan,

I have added a comment on that PR proposing some refactoring.
If that is OK with @hougangliu, then it seems to me that PipelineVolumes will be supported seamlessly.

@stale
Copy link

stale bot commented Jun 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 26, 2020
@stale
Copy link

stale bot commented Jul 3, 2020

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Jul 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/samples kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label. priority/p2
Projects
None yet
Development

No branches or pull requests

4 participants