-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Runtime Attribute override to Development Spec #301
Add Runtime Attribute override to Development Spec #301
Conversation
@illusional this looks great. I would err on the side of caution and explicitly ban expressions in the supplied |
My only complaint here is the mixing together of runtime and inputs into a single
Today we can validate an Another advantage of separate files is easier re-use. With separate files I could take the same |
Thanks @patmagee! I've clarified that expressions should not be allowed:
I agree @cjllanwarne that best practices should be to put these in a separate |
My $0.02 .....
|
@geoffjentry @cjllanwarne Do you think there should be a specific type of input called runtime, something like I'm happy to unwind disallowing expressions, my thought was if you can determine your runtime attribute based on an expression, you may as well bake it into the wdl. But thinking about it you may want a specific disk format for GCP, or something else for a different backend. |
@illusional Ah I see what you mean. I don't have a strong opinion. re expressions I also odn't feel super strongly about this either. Happy to go w/ the flow of public opinion |
Inputs namesI didn't notice that Add
|
The more I think about it the more I like just saying that best practice is to provide it as a separate input file. This reinforces that view point of runtime attrs being special inputs instead of a pure first class thing. It also means not having to define a new file type. However, that does imply codifying that there can be multiple input jsons and specifying things like resolution rules. |
@cjllanwarne That's a great point, is there an easy way I can modify my PR to prohibit this, like potentially modifying the validator regex to not accept "runtime"? Your multiple compose-able inputs is exactly what we're planning do when we run ours. We can specify essentially static files for specific environments (reference genome etc), one for our input files and then (if this proposal passes) one for our runtime for each environment. @geoffjentry it seems that the spec doesn't do a great job of specifying the order of resolution rules (except maybe #variable resolution, hence why I added a section to mine (though I think I could clarify the wording). There is an order that has been proposed, specifically #141 (backed up by broadinstitute/cromwell#2446) but not implemented yet. I'm also happy to spin up another PR to address this if the community wants that. |
(made simpler if openwdl/wdl/issues/262 and openwdl/wdl/pull/301 are resolved)
@@ -2646,6 +2648,30 @@ In JSON, the inputs to the workflow in the previous section might be: | |||
|
|||
It's important to note that the type in JSON must be coercible to the WDL type. For example `wf.int_val` expects an integer, but if we specified it in JSON as `"wf.int_val": "three"`, this coercion from string to integer is not valid and would result in a coercion error. See the section on [Type Coercion](#type-coercion) for more details. | |||
|
|||
## Specifying / Overriding Runtime Attributes in JSON | |||
|
|||
Workflow runtime attributes may additionally be specified as key/value pairs within a JSON input file. It's recommended that this should be a separate inputs file called `runtime.json`. The mapping from JSON or YAML values to WDL values is codified similar to the serialization of task inputs section, however with an additional runtime tag to avoid name collisions and allow for unspecified attributes to be set by the runtime input file. Runtime attributes do not need to be specified in the task defintion to be overidden or set by the JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's recommended that this should be a separate inputs file called
runtime.json
Does the spec need to provide this information? The spec never specifies that workflow inputs must be specified using a file, or suggest a name for that file. The spec doesn't mention how inputs are actually provided, which seems correct as this is a WDL implementor decision. (The spec could be improved by specifically listing things that a WDL implementor must define as part of their implementation, but that is a different issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting ready to throw down a giant "you're wrong", but you're correct that the spec does not literally say this is a different file. However it's heavily implied and as one of the people who are responsible for all the less than correct things in the spec I can assure you that's what we meant.
I like how here it's a suggestion, we don't need to make a suggestion but IMO it's not a bad idea to encourage conformist behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however with an additional runtime tag
I don't think this is precise enough. I would say "nested inside the task.runtime
object".
I agree with @pshapiro4broad that the spec shouldn't mention input files, since that's implementation specific. If we want to acknowledge input files in the spec it should be a separate PR. However, this PR might be able to say something like "we recommend that the executor supports providing runtime attributes separately from the workflow inputs, for instance as a separate input JSON file"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to remove my statement in favour of @TMiguelT's:
We recommend that the executor supports providing runtime attributes separately from the workflow inputs, for instance as a separate inputs (maybe:
runtime.json
) JSON file.
I'm mixed, I'm very opinionated that you should specify your runtime (such as specific google cloud, or specific SFS) attributes separately to your inputs. I think it promotes a good practice, and will probably allow people to better share optimised workflows. The more I think about it, the less I believe the spec is the right place to put this, but I'm also not sure if there is another place.
Our use case is, we want to run the same workflow with varying inputs sizes on different backends. I can create specific optimised runtime jsons for our different targets and give them to other researchers or clinicians who don't need to know about the attributes we decided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My wording does encourage the implementations to allow a separate runtime file, but it doesn't require it, because as you say it's not exactly something the spec can require
I'd like to throw in one more "it would be really nice to tidy this up at the same time" - I believe (but it may take some hunting down) that there's a line somewhere that says "you can't provide inputs directly to subworkflows or their tasks". Since we're saying that |
I haven't seen this mentioned yet, apologies if it has, but if runtime inputs are to become settable via json, they should not be used for call caching. This would eliminate so many headaches where memory is mapped to a regular input and needs to be tweaked for unforeseen reasons |
@dheiman Just keep in mind that call caching isn't a WDL thing, it's a potential feature of implementations |
@geoffjentry fair enough, but isn't an implementation required before the PR can be merged? I'd be happy to move the comment over to the Cromwell PR/issue if this passes. |
@dheiman Yeah, that's exactly what I was getting at. Also keep in mind that voting here can't reasonably be done with a disclaimer of "as long as my preferred implementation does X" since the |
versions/development/SPEC.md
Outdated
} | ||
``` | ||
|
||
As the runtime section consists of key/value pairs, it is the user's responsibility to ensure they provide the correct coercible type for the backend they are targeting. Expressions are disallowed within the `runtime.json`, hence no input will be coerced to an expression. See the section on [Type Coercion](#type-coercion) for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that expressions are not allowed, it's that they're not evaluated. For instance you can put in 1+1
as a runtime attribute value, it just won't end up as 2
downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see how my initial wording is a little ambiguous, I've modified it (in 26537d5) to read:
An input value should not be coerced to an expression, hence values provided within the
runtime.json
should be evaluated.
I think it is clearer to read that what you provide shouldn't be evaluated (like if you provided "my test ${1+1} expression"
, it would remain), and the language better aligns with the rest of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you should say they shouldn't be evaluated. But right now your wording says "values provided within the runtime.json should be evaluated", which is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... maybe I should remove attention to detail on my resume (fixed).
maybe I shouldn't add attention to detail on my resume anymore
It seems as though this PR brings up a bit of a delicate issue that I think we need to resolve before we can consider voting on this. Should the Spec define inputs and the specific input files and their Specific format, or should this be an implementation detail There are actually benefits to both sides of this. being opinionated on the matter leads to greater conformance and reusability of workflows, at the potential cost of strong arming implements. Additionally by defining this right in the specification, open standards such as WES become easier to implement at a language level. At the same time, requiring inputs in a specific format here (with support for specific files), could be beyond the capabilities of different implementors. Additionally, it might make it more harder for them to support the full feature set of WDL What are everyone's thoughts? |
My thought is that I hate to use JSON for configuration since YAML is so much better for humans to edit. So I think the spec should define the input files, but not the format. (Unless it is my preferred format YAML of course 😉 ). But seriously, that bit should be an implementation detail. Who knows what useful formats come along in the future? |
@patmagee My $0.02 is to treat them separately and I don't think a particular order is necssary. We can evaluate this change given the current spec (the one which does specify input format) and then if someone wants to remove that portion of the spec (or modify it) afterwards that can be a separate discussion afterwards. If that implies changes to this or anything else in the spec, it's on that proposer to make the appropriate edits. IOW i think this has been hanging around long enough to judge it on its own merits and let complaints about inputs being codified be handled afterwards. |
@patmagee It's something I definitely didn't consider when spinning this PR up (hence the repeated use of JSON), but I wholeheartedly agree that the spec should not make an opinion on the format of inputs, eg no use of JSON, YAML, XML or others. Happy for whatever route we go, except that for tracking purposes it would be better to create a separate issues / PR to ensure that the community gets to vote on that. I do want the spec to be opinionated about best practices for these specific input files, I think it promotes the wider goal of portability. Side note, these two issues will need to be resolved before this gets implemented, or maybe even voted on. @geoffjentry Is it worth me spinning up a new issue to cover these, or put it in with this (as this feature would require pass through of inputs to subworkflows to work. |
Not to piggy back on this issue, but adding it as a note here so I don't forget. It would also be great if you could reference a runtime attribute inside the WDL command. For example, many tools require, or allow you to pass a |
@illusional Sorry, I'm confused - why do the subworkflow PRs need to come first? |
@geoffjentry Otherwise you couldn't pass runtime attributes to tools under a subworkflow. I guess it's not directly a pre-req, but something I feel should be completed before this feature is useful. |
Ok. Yeah I was wondering why it was a hard requirement and not a nice to have. Either way, this is your PR so if you want to table it until all of this other stuff is hashed out that's fine. As I said above I don't think these things all need to fit into a particular order, but that's up to you :) |
Hi everyone. It occurred to me that it is a common pattern in Broad-authored WDLs (and thus the majority of WDLs out in the wild) to already do what @illusional is suggesting and wire their runtime attributes in as workflow inputs. In effect, the WDL authors have already spoken and this is the model that they would like to see. This PR would clean up that behavior and make it less ad hoc. The subworkflows issue would be nice to have but as @illusional noted it is not truly a pre-req. The inputs json thing, well, it's already in the spec. We should not be voting against one thing because we dont like another thing already in the spec (and IMO it not only should be in the spec, but enhanced). @illusional - I'd like to move this forward. If you want to hold off let me know and I'll take this over. |
Thanks @geoffjentry, happy for it to move forward! |
@illusional I realized it was going to be a pain to make commits to this as it's on your fork. Closing and reopening as #313 |
Could you not have used the "allow edits from maintainers" option? |
It was off when I had a look, but it's not an option when you create a PR, just tucked away after you've created one. I turned it on now (not that it's any good). |
I also didn't realize that was a thing :) |
I raised an issue (#298) around overriding runtime attributes by JSON input, I've taken the comment I left there and turned it into this PR.
I don't believe the JSON runtime override should be able to support expressions, like if someone specifically put in:
Ie: I don't think it should resolve myFileInput then perform the computation. I don't know if I should explicitly state this though.