Add Runtime Attribute override to Development Spec #301

illusional · 2019-04-02T03:18:26Z

I raised an issue (#298) around overriding runtime attributes by JSON input, I've taken the comment I left there and turned it into this PR.

I don't believe the JSON runtime override should be able to support expressions, like if someone specifically put in:

{  "wf.my_task.runtime.cpus": "${size(myFileInput) * 4}"  }

Ie: I don't think it should resolve myFileInput then perform the computation. I don't know if I should explicitly state this though.

patmagee · 2019-04-02T10:54:20Z

@illusional this looks great. I would err on the side of caution and explicitly ban expressions in the supplied JSON. This is the type of thing that if not banned people will invariably try to do

cjllanwarne · 2019-04-02T16:38:33Z

My only complaint here is the mixing together of runtime and inputs into a single json file. Ideally (in my opinion), the runtime would be overridable via a separate runtime.json to:

Avoid name collisions and confusion between inputs and runtime attributes
Preserve inputs.json validation
Allow separate reuse of inputs and runtime overrides without having to cherry-pick out the values you want to keep.

Today we can validate an inputs.json by comparing it to the task. If we say "any unrecognized inputs are runtime overrides" then we lose the ability to detect typos early on.

Another advantage of separate files is easier re-use. With separate files I could take the same inputs.json file and swap out the runtime.json to run against a different cloud. Or, I could keep the same runtime.json and swap out the inputs.json to run a different input set on the same cloud. If the files are mixed up together I have to do some tedious cherry-picking of values I want to keep each time.

illusional · 2019-04-02T23:23:57Z

Thanks @patmagee! I've clarified that expressions should not be allowed:

Expressions are disallowed within the runtime.json, hence no input will be coerced to an expression.

I agree @cjllanwarne that best practices should be to put these in a separate runtime.json inputs file, I've more explicitly mentioned this now in my latest commits. Also, my proposal was to add an additional runtime tag to be transparent that the user is overriding runtime attributes. eg: key: wf.my_task.runtime.runtime_attribute.

geoffjentry · 2019-04-02T23:47:33Z

My $0.02 .....

My initial reaction (from the previous issue) to embedding these in the inputs json was dislike but the more I thought about I moved all the way around to preferring it. There are two reasons for this - one being that my experience has been that many members of the target audience loathe dealing with the extra files, and the other being that Im now starting to picture runtime values as just being special inputs (in fact, many Broad based WDLs just use actual inputs to fill in the runtime values)
Likewise my initial reaction to @patmagee 's suggestion was "Right on!" but the more I think about it I wouldn't mind allowing expression strings in there. TBH I'd be surprised if that wasn't a pretty immediate user request
Seeing your edit to allow both a runtime.json and directly embedding in the inputs json is good in the "why not both" manner. I do think that if we're going to go that route that it'll need to be fleshed out a bit more in the current PR, e.g. it's not mentioned in the resolution order. It might also be good to perhaps loop back to inputs json definition and create a section about companion files in general

illusional · 2019-04-02T23:58:01Z

@geoffjentry @cjllanwarne Do you think there should be a specific type of input called runtime, something like -r runtime.json if passing through Cromwell? My intention was just to tell a user that best practice is to separate your inputs into a inputs.json and a runtime.json, and then pass multiple of those to Cromwell (eg: through -i).

I'm happy to unwind disallowing expressions, my thought was if you can determine your runtime attribute based on an expression, you may as well bake it into the wdl. But thinking about it you may want a specific disk format for GCP, or something else for a different backend.

geoffjentry · 2019-04-03T01:46:31Z

@illusional Ah I see what you mean. I don't have a strong opinion.

re expressions I also odn't feel super strongly about this either. Happy to go w/ the flow of public opinion

cjllanwarne · 2019-04-03T14:56:31Z

Inputs names

I didn't notice that .runtime. in the name string before - as long as we disallow runtime as a task or value name that should be fine (and I think that's pretty reasonable!).

Add `runtime.json` vs allow multiple `input.json`s

I have to say that I do like the idea of considering these to just be multiple input files which can be combined together or left separate per user preference. Cromwell already allows this so (selfishly) it would be nice to just make that formal in the spec.

Multiple compose-able input files would allow you to, for example, break down your submission into:

Basic inputs (which rarely change between runs. A reference genome for example)
Run specific inputs (the ones that you're tweaking this time)
Runtime inputs
Other as-yet-unforeseen inputs?

Or, if you prefer, just bundle them all together as a single package (which again, someone could surgically override later if they chose to).

The more I think about it, the more that seems to make the most sense to me!

Expressions in input files

I'd prefer to be conservative with allowing expressions-in-inputs for now. I'm 50:50 on the merits so my only strong arguments for doing it separately would be (a) seeing whether it's really needed, (b) a background/general preference for iterative changes vs "change everything at once" PRs

geoffjentry · 2019-04-03T19:34:30Z

The more I think about it the more I like just saying that best practice is to provide it as a separate input file. This reinforces that view point of runtime attrs being special inputs instead of a pure first class thing. It also means not having to define a new file type.

However, that does imply codifying that there can be multiple input jsons and specifying things like resolution rules.

illusional · 2019-04-03T22:53:07Z

as long as we disallow runtime as a task or value name that should be fine

@cjllanwarne That's a great point, is there an easy way I can modify my PR to prohibit this, like potentially modifying the validator regex to not accept "runtime"?

Your multiple compose-able inputs is exactly what we're planning do when we run ours. We can specify essentially static files for specific environments (reference genome etc), one for our input files and then (if this proposal passes) one for our runtime for each environment.

@geoffjentry it seems that the spec doesn't do a great job of specifying the order of resolution rules (except maybe #variable resolution, hence why I added a section to mine (though I think I could clarify the wording). There is an order that has been proposed, specifically #141 (backed up by broadinstitute/cromwell#2446) but not implemented yet. I'm also happy to spin up another PR to address this if the community wants that.

(made simpler if openwdl/wdl/issues/262 and openwdl/wdl/pull/301 are resolved)

versions/development/SPEC.md

pshapiro4broad · 2019-04-18T19:58:54Z

versions/development/SPEC.md

@@ -2646,6 +2648,30 @@ In JSON, the inputs to the workflow in the previous section might be:

 It's important to note that the type in JSON must be coercible to the WDL type.  For example `wf.int_val` expects an integer, but if we specified it in JSON as `"wf.int_val": "three"`, this coercion from string to integer is not valid and would result in a coercion error.  See the section on [Type Coercion](#type-coercion) for more details.

+## Specifying / Overriding Runtime Attributes in JSON
+
+Workflow runtime attributes may additionally be specified as key/value pairs within a JSON input file. It's recommended that this should be a separate inputs file called `runtime.json`. The mapping from JSON or YAML values to WDL values is codified similar to the serialization of task inputs section, however with an additional runtime tag to avoid name collisions and allow for unspecified attributes to be set by the runtime input file. Runtime attributes do not need to be specified in the task defintion to be overidden or set by the JSON.


It's recommended that this should be a separate inputs file called runtime.json

Does the spec need to provide this information? The spec never specifies that workflow inputs must be specified using a file, or suggest a name for that file. The spec doesn't mention how inputs are actually provided, which seems correct as this is a WDL implementor decision. (The spec could be improved by specifically listing things that a WDL implementor must define as part of their implementation, but that is a different issue.)

I was getting ready to throw down a giant "you're wrong", but you're correct that the spec does not literally say this is a different file. However it's heavily implied and as one of the people who are responsible for all the less than correct things in the spec I can assure you that's what we meant.

I like how here it's a suggestion, we don't need to make a suggestion but IMO it's not a bad idea to encourage conformist behavior

however with an additional runtime tag

I don't think this is precise enough. I would say "nested inside the task.runtime object".

I agree with @pshapiro4broad that the spec shouldn't mention input files, since that's implementation specific. If we want to acknowledge input files in the spec it should be a separate PR. However, this PR might be able to say something like "we recommend that the executor supports providing runtime attributes separately from the workflow inputs, for instance as a separate input JSON file"

I'm happy to remove my statement in favour of @TMiguelT's:

We recommend that the executor supports providing runtime attributes separately from the workflow inputs, for instance as a separate inputs (maybe: runtime.json) JSON file.

I'm mixed, I'm very opinionated that you should specify your runtime (such as specific google cloud, or specific SFS) attributes separately to your inputs. I think it promotes a good practice, and will probably allow people to better share optimised workflows. The more I think about it, the less I believe the spec is the right place to put this, but I'm also not sure if there is another place.

Our use case is, we want to run the same workflow with varying inputs sizes on different backends. I can create specific optimised runtime jsons for our different targets and give them to other researchers or clinicians who don't need to know about the attributes we decided.

My wording does encourage the implementations to allow a separate runtime file, but it doesn't require it, because as you say it's not exactly something the spec can require

cjllanwarne · 2019-04-19T14:09:01Z

I'd like to throw in one more "it would be really nice to tidy this up at the same time" -

I believe (but it may take some hunting down) that there's a line somewhere that says "you can't provide inputs directly to subworkflows or their tasks". Since we're saying that runtime are really just inputs, and since I would hope that runtime could be provided all the way down the call stack, it might be nice to also remove that line (and/or explicitly state somewhere that we can provide any inputs we choose as part of the input set)

dheiman · 2019-04-19T14:13:08Z

I haven't seen this mentioned yet, apologies if it has, but if runtime inputs are to become settable via json, they should not be used for call caching. This would eliminate so many headaches where memory is mapped to a regular input and needs to be tweaked for unforeseen reasons

geoffjentry · 2019-04-19T14:16:34Z

@dheiman Just keep in mind that call caching isn't a WDL thing, it's a potential feature of implementations

dheiman · 2019-04-19T14:26:17Z

@geoffjentry fair enough, but isn't an implementation required before the PR can be merged? I'd be happy to move the comment over to the Cromwell PR/issue if this passes.

geoffjentry · 2019-04-19T15:21:37Z

@dheiman Yeah, that's exactly what I was getting at. Also keep in mind that voting here can't reasonably be done with a disclaimer of "as long as my preferred implementation does X" since the Waiting for implementation phase happens post-voting.

multimeric · 2019-04-29T01:41:13Z

versions/development/SPEC.md

+}
+```
+
+As the runtime section consists of key/value pairs, it is the user's responsibility to ensure they provide the correct coercible type for the backend they are targeting. Expressions are disallowed within the `runtime.json`, hence no input will be coerced to an expression. See the section on [Type Coercion](#type-coercion) for more details.


It's not that expressions are not allowed, it's that they're not evaluated. For instance you can put in 1+1 as a runtime attribute value, it just won't end up as 2 downstream.

Yeah I see how my initial wording is a little ambiguous, I've modified it (in 26537d5) to read:

An input value should not be coerced to an expression, hence values provided within the runtime.json should be evaluated.

I think it is clearer to read that what you provide shouldn't be evaluated (like if you provided "my test ${1+1} expression", it would remain), and the language better aligns with the rest of the spec.

Yes, I think you should say they shouldn't be evaluated. But right now your wording says "values provided within the runtime.json should be evaluated", which is incorrect.

... maybe I should remove attention to detail on my resume (fixed).

Per openwdl#301 (comment)

maybe I shouldn't add attention to detail on my resume anymore

patmagee · 2019-05-08T11:11:09Z

It seems as though this PR brings up a bit of a delicate issue that I think we need to resolve before we can consider voting on this.

Should the Spec define inputs and the specific input files and their Specific format, or should this be an implementation detail

There are actually benefits to both sides of this. being opinionated on the matter leads to greater conformance and reusability of workflows, at the potential cost of strong arming implements. Additionally by defining this right in the specification, open standards such as WES become easier to implement at a language level.

At the same time, requiring inputs in a specific format here (with support for specific files), could be beyond the capabilities of different implementors. Additionally, it might make it more harder for them to support the full feature set of WDL

What are everyone's thoughts?

rhpvorderman · 2019-05-08T14:39:54Z

My thought is that I hate to use JSON for configuration since YAML is so much better for humans to edit. So I think the spec should define the input files, but not the format. (Unless it is my preferred format YAML of course 😉 ). But seriously, that bit should be an implementation detail. Who knows what useful formats come along in the future?
You would not want to be stuck on something like XML for configuration because "the spec says so". Any file that can be parsed to some sort of dictionary should be fine.

geoffjentry · 2019-05-08T21:03:23Z

@patmagee My $0.02 is to treat them separately and I don't think a particular order is necssary. We can evaluate this change given the current spec (the one which does specify input format) and then if someone wants to remove that portion of the spec (or modify it) afterwards that can be a separate discussion afterwards. If that implies changes to this or anything else in the spec, it's on that proposer to make the appropriate edits.

IOW i think this has been hanging around long enough to judge it on its own merits and let complaints about inputs being codified be handled afterwards.

illusional · 2019-05-08T22:18:38Z

@patmagee It's something I definitely didn't consider when spinning this PR up (hence the repeated use of JSON), but I wholeheartedly agree that the spec should not make an opinion on the format of inputs, eg no use of JSON, YAML, XML or others. Happy for whatever route we go, except that for tracking purposes it would be better to create a separate issues / PR to ensure that the community gets to vote on that.

I do want the spec to be opinionated about best practices for these specific input files, I think it promotes the wider goal of portability.

Side note, these two issues will need to be resolved before this gets implemented, or maybe even voted on. @geoffjentry Is it worth me spinning up a new issue to cover these, or put it in with this (as this feature would require pass through of inputs to subworkflows to work.

illusional · 2019-05-15T22:43:03Z

Not to piggy back on this issue, but adding it as a note here so I don't forget. It would also be great if you could reference a runtime attribute inside the WDL command. For example, many tools require, or allow you to pass a threads attribute, where this is intrinsically tied to the runtime.

geoffjentry · 2019-05-17T01:43:49Z

@illusional Sorry, I'm confused - why do the subworkflow PRs need to come first?

illusional · 2019-05-17T01:58:02Z

@geoffjentry Otherwise you couldn't pass runtime attributes to tools under a subworkflow. I guess it's not directly a pre-req, but something I feel should be completed before this feature is useful.

geoffjentry · 2019-05-17T02:00:11Z

Ok. Yeah I was wondering why it was a hard requirement and not a nice to have. Either way, this is your PR so if you want to table it until all of this other stuff is hashed out that's fine. As I said above I don't think these things all need to fit into a particular order, but that's up to you :)

geoffjentry · 2019-05-22T12:56:11Z

Hi everyone. It occurred to me that it is a common pattern in Broad-authored WDLs (and thus the majority of WDLs out in the wild) to already do what @illusional is suggesting and wire their runtime attributes in as workflow inputs.

In effect, the WDL authors have already spoken and this is the model that they would like to see. This PR would clean up that behavior and make it less ad hoc.

The subworkflows issue would be nice to have but as @illusional noted it is not truly a pre-req. The inputs json thing, well, it's already in the spec. We should not be voting against one thing because we dont like another thing already in the spec (and IMO it not only should be in the spec, but enhanced).

@illusional - I'd like to move this forward. If you want to hold off let me know and I'll take this over.

illusional · 2019-05-22T20:34:19Z

Thanks @geoffjentry, happy for it to move forward!

geoffjentry · 2019-05-30T18:25:11Z

@illusional I realized it was going to be a pain to make commits to this as it's on your fork. Closing and reopening as #313

multimeric · 2019-05-30T23:30:06Z

Could you not have used the "allow edits from maintainers" option?

illusional · 2019-05-30T23:41:46Z

It was off when I had a look, but it's not an option when you create a PR, just tucked away after you've created one. I turned it on now (not that it's any good).

geoffjentry · 2019-05-31T01:04:37Z

I also didn't realize that was a thing :)

Add runtime attribute override to dev spec

eca2b36

illusional added 2 commits April 3, 2019 10:15

Clarify runtime attributes + no JSON expressions

b413af0

Remove explicit + forbidden from runtime override

233b663

illusional mentioned this pull request Apr 11, 2019

Allow passthrough of inputs from the inputs.json to a subworkflow #262

Closed

illusional added a commit to PMCC-BioinformaticsCore/janis that referenced this pull request Apr 11, 2019

Add foundation for resource override in WDL

3d963dc

(made simpler if openwdl/wdl/issues/262 and openwdl/wdl/pull/301 are resolved)

pshapiro4broad reviewed Apr 18, 2019

View reviewed changes

versions/development/SPEC.md Outdated Show resolved Hide resolved

Remove typo and clarify recommendation

7d744e3

pshapiro4broad reviewed Apr 18, 2019

View reviewed changes

multimeric reviewed Apr 29, 2019

View reviewed changes

illusional added 3 commits April 29, 2019 12:08

Remove duplicate TOC entry

bc9afaa

Per openwdl#301 (comment)

Update SPEC.md

26537d5

should -> should not

362a46a

maybe I shouldn't add attention to detail on my resume anymore

geoffjentry mentioned this pull request May 8, 2019

Fix: Uncapitalize pair fields in inputs json #222

Merged

geoffjentry mentioned this pull request May 30, 2019

Specify that runtime attributes can be declared as inputs #313

Closed

geoffjentry closed this May 30, 2019

Add Runtime Attribute override to Development Spec #301

Add Runtime Attribute override to Development Spec #301

Conversation

illusional commented Apr 2, 2019

patmagee commented Apr 2, 2019

cjllanwarne commented Apr 2, 2019

illusional commented Apr 2, 2019 • edited Loading

geoffjentry commented Apr 2, 2019

illusional commented Apr 2, 2019

geoffjentry commented Apr 3, 2019

cjllanwarne commented Apr 3, 2019

Inputs names

Add runtime.json vs allow multiple input.jsons

Expressions in input files

geoffjentry commented Apr 3, 2019

illusional commented Apr 3, 2019 • edited Loading

pshapiro4broad Apr 18, 2019

Choose a reason for hiding this comment

geoffjentry Apr 19, 2019

Choose a reason for hiding this comment

multimeric Apr 29, 2019 • edited Loading

Choose a reason for hiding this comment

illusional Apr 29, 2019 • edited Loading

Choose a reason for hiding this comment

multimeric Apr 29, 2019 • edited Loading

Choose a reason for hiding this comment

cjllanwarne commented Apr 19, 2019

dheiman commented Apr 19, 2019

geoffjentry commented Apr 19, 2019

dheiman commented Apr 19, 2019

geoffjentry commented Apr 19, 2019

multimeric Apr 29, 2019

Choose a reason for hiding this comment

illusional Apr 29, 2019

Choose a reason for hiding this comment

multimeric Apr 29, 2019

Choose a reason for hiding this comment

illusional Apr 29, 2019 • edited Loading

Choose a reason for hiding this comment

patmagee commented May 8, 2019

rhpvorderman commented May 8, 2019

geoffjentry commented May 8, 2019

illusional commented May 8, 2019

illusional commented May 15, 2019

geoffjentry commented May 17, 2019

illusional commented May 17, 2019

geoffjentry commented May 17, 2019

geoffjentry commented May 22, 2019

illusional commented May 22, 2019

geoffjentry commented May 30, 2019

multimeric commented May 30, 2019

illusional commented May 30, 2019

geoffjentry commented May 31, 2019

illusional commented Apr 2, 2019 •

edited

Loading

Add `runtime.json` vs allow multiple `input.json`s

illusional commented Apr 3, 2019 •

edited

Loading

multimeric Apr 29, 2019 •

edited

Loading

illusional Apr 29, 2019 •

edited

Loading

multimeric Apr 29, 2019 •

edited

Loading

illusional Apr 29, 2019 •

edited

Loading