-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: interpreter
runtime attribute for non-bash tasks
#404
Comments
I think it is a good addition, but very complex. What about just using
Versions and such should be mediated by the container used right? If you set a container a wrong version of python cannot be used. Alternatively python also sets a versioned executable on path i.e. I think the version and interpreter stuff will be very hard to implement given all the edge case and all interpreters out there. Just giving the executable should be much easier. On the other hand if only |
The things that specifying the interpreter details offers is 1) the ability for the engine to provide a container with a matching interpreter if the user doesn't provide their own container, and/or 2) the ability for the engine to check that an appropriate interpreter exists in the runtime environment. Without those details, the engine can still provide a container/check that the interpreter exists, but it can't deal with the situation where a specific minimum/maximum interpreter is required (i.e. some python code written for v3.6+ won't run on a v3.5 interpreter). We could get rid of |
@jdidion thanks for putting this together. I think this would definitely be a use-ability improvement for a lot of tasks that do not depend on the bash interpreter. In the first proposal, I would generally agree with @rhpvorderman that it's quite a complex addition to the task definition in its current iteration (although I would agree, that it would be the most flexible). I wonder if its possible to simplify this even further and strip out all of the version info and offload that to the container. defining I would propose we combine both of your suggestions to allow defining the interpreter in two ways: runtime {
# Simple definition. Expected behaviour would be executing the code with the default args for that interpreter
# according to the engine.
interpreter: "python"
}
runtime {
# Complex definition. modify the default behaviour of the engine
interpreter: {
"name": "python",
"args": [ "...","..."]
}
} A question that this brings up is should we define expected behaviour for a set of standard interpreters in WDL directly (such as the flags that those interpreters should be started with) |
So I had the opportunity to get into using a bit of nextflow. How they solve this problem is pretty simple, they simply use a shebang line! We could emulate that behaviour. The default behaviour would still be to use task a {
command <<<
#!/usr/bin/env python
print("helllo")
>>>
}
|
Isn't this already documented? task a {
command <<<
python <<CODE
print('"hello")
CODE
>>>
} The SPEC is literally full of these sorts of examples. EDIT: Like I said above:
Let the interpreter version be handled by the container you are running. |
@rhpvorderman this is slightly different. That is calling the python command using a heredoc, but the underlyng script is still treated as a bash script. Whereas the shebang is indicating that is should be treated as a python script |
Could you explain why the heredoc syntax would not work in this case? I am not in favour of adding extra syntax and complexity to WDL if there is not a clear benefit. On all these issues I tend to say: make sure you have some sort of container, so you have full control of the interpreter. And either:
If the "only bash" policy forces users to make proper tools instead of making WDLs full of inline lisp scripts, I'd say that is a win for WDL as a language. I come from a snakemake background, and that was truly awful. Users were granted so much freedom that no two snakemake pipelines looked alike. Hence it was impossible to learn. Everyone rolled their own custom solution to the same problems. So I am very conservative when it comes to adding stuff to WDL that even remotely hints at allowing the same freedom. I get @jdidion 's argument that the execution engine can solve which interpreter is needed. But I think that is already handled by containers. If you run your job outside of a container that is bad practice, and there should not be any syntax to make it seem like running outside a container is a viable option. |
I agree with @rhpvorderman 's last comment. In the WDL I've worked on the current behavior hasn't been a big enough obstacle to require a change to the WDL language. I kind of like @patmagee 's suggestion to support a shebang line, but one downside I can see is that it requires the WDL runtime to read the first line of the command block and interpret it, possibly using string interpolation. Placing it outside the block would make it more clearer that the WDL runtime will handle it specially and that it's not really part of the command script. |
Also see discussions here: #408
Currently, it is specified that the task command block is executed by a bash interpreter. Non-bash languages are supported either by heredocs or by putting the commands in an external script and calling that script from the command block.
This proposal is to add the optional
interpreter
attribute to the runtime section. This keyword will be used to specify the interpreter program (name, version, etc) used to execute the command block. This information can be used by the WDL runtime in several ways:Example:
The
interpreter
keyword can take one of the following values:Array[String]
with multiple interpreter variants, in order of preference.Array[String]
with arguments to pass to the interpreter when executing the command. The runtime may specify interpreter arguments that are always used and/or not allowed - any arguments specified by theargs
keyword are merged with the required arguments, and an error is raised if any of the user-specified arguments are disallowed.Array[Object]
with multiple Object-style interpreter variants.Additional rules:
interpreter
keyword is specified, the default value is"bash"
.interpreter
keyword is specified, the host system must provided an interpreter that is compatible with one of the user's specifications, or raise an error if no compatible interpreters are available.container
runtime attribute, then one of the specified interpreters must be available in that container.container
specifies an array of containers, then the runtime may (but is not required to) use the availability of one of the specified interpreters within the container as a criteria in selecting which of the containers to use.container
runtime attribute then the runtime may try either or both of the following, in any order. In either case, no other dependencies are guaranteed to exist aside from the interpreter.Also see discussion here: #408
The text was updated successfully, but these errors were encountered: