Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Runtime Attribute override to Development Spec #301

Closed

Conversation

illusional
Copy link
Contributor

I raised an issue (#298) around overriding runtime attributes by JSON input, I've taken the comment I left there and turned it into this PR.

I don't believe the JSON runtime override should be able to support expressions, like if someone specifically put in:

{  "wf.my_task.runtime.cpus": "${size(myFileInput) * 4}"  }

Ie: I don't think it should resolve myFileInput then perform the computation. I don't know if I should explicitly state this though.

@patmagee
Copy link
Member

patmagee commented Apr 2, 2019

@illusional this looks great. I would err on the side of caution and explicitly ban expressions in the supplied JSON. This is the type of thing that if not banned people will invariably try to do

@cjllanwarne
Copy link
Contributor

My only complaint here is the mixing together of runtime and inputs into a single json file. Ideally (in my opinion), the runtime would be overridable via a separate runtime.json to:

  • Avoid name collisions and confusion between inputs and runtime attributes
  • Preserve inputs.json validation
  • Allow separate reuse of inputs and runtime overrides without having to cherry-pick out the values you want to keep.

Today we can validate an inputs.json by comparing it to the task. If we say "any unrecognized inputs are runtime overrides" then we lose the ability to detect typos early on.

Another advantage of separate files is easier re-use. With separate files I could take the same inputs.json file and swap out the runtime.json to run against a different cloud. Or, I could keep the same runtime.json and swap out the inputs.json to run a different input set on the same cloud. If the files are mixed up together I have to do some tedious cherry-picking of values I want to keep each time.

@illusional
Copy link
Contributor Author

illusional commented Apr 2, 2019

Thanks @patmagee! I've clarified that expressions should not be allowed:

Expressions are disallowed within the runtime.json, hence no input will be coerced to an expression.

I agree @cjllanwarne that best practices should be to put these in a separate runtime.json inputs file, I've more explicitly mentioned this now in my latest commits. Also, my proposal was to add an additional runtime tag to be transparent that the user is overriding runtime attributes. eg: key: wf.my_task.runtime.runtime_attribute.

@geoffjentry
Copy link
Member

My $0.02 .....

  • My initial reaction (from the previous issue) to embedding these in the inputs json was dislike but the more I thought about I moved all the way around to preferring it. There are two reasons for this - one being that my experience has been that many members of the target audience loathe dealing with the extra files, and the other being that Im now starting to picture runtime values as just being special inputs (in fact, many Broad based WDLs just use actual inputs to fill in the runtime values)
  • Likewise my initial reaction to @patmagee 's suggestion was "Right on!" but the more I think about it I wouldn't mind allowing expression strings in there. TBH I'd be surprised if that wasn't a pretty immediate user request
  • Seeing your edit to allow both a runtime.json and directly embedding in the inputs json is good in the "why not both" manner. I do think that if we're going to go that route that it'll need to be fleshed out a bit more in the current PR, e.g. it's not mentioned in the resolution order. It might also be good to perhaps loop back to inputs json definition and create a section about companion files in general

@illusional
Copy link
Contributor Author

@geoffjentry @cjllanwarne Do you think there should be a specific type of input called runtime, something like -r runtime.json if passing through Cromwell? My intention was just to tell a user that best practice is to separate your inputs into a inputs.json and a runtime.json, and then pass multiple of those to Cromwell (eg: through -i).

I'm happy to unwind disallowing expressions, my thought was if you can determine your runtime attribute based on an expression, you may as well bake it into the wdl. But thinking about it you may want a specific disk format for GCP, or something else for a different backend.

@geoffjentry
Copy link
Member

@illusional Ah I see what you mean. I don't have a strong opinion.

re expressions I also odn't feel super strongly about this either. Happy to go w/ the flow of public opinion

@cjllanwarne
Copy link
Contributor

Inputs names

I didn't notice that .runtime. in the name string before - as long as we disallow runtime as a task or value name that should be fine (and I think that's pretty reasonable!).

Add runtime.json vs allow multiple input.jsons

I have to say that I do like the idea of considering these to just be multiple input files which can be combined together or left separate per user preference. Cromwell already allows this so (selfishly) it would be nice to just make that formal in the spec.

Multiple compose-able input files would allow you to, for example, break down your submission into:

  • Basic inputs (which rarely change between runs. A reference genome for example)
  • Run specific inputs (the ones that you're tweaking this time)
  • Runtime inputs
  • Other as-yet-unforeseen inputs?

Or, if you prefer, just bundle them all together as a single package (which again, someone could surgically override later if they chose to).

The more I think about it, the more that seems to make the most sense to me!

Expressions in input files

I'd prefer to be conservative with allowing expressions-in-inputs for now. I'm 50:50 on the merits so my only strong arguments for doing it separately would be (a) seeing whether it's really needed, (b) a background/general preference for iterative changes vs "change everything at once" PRs

@geoffjentry
Copy link
Member

The more I think about it the more I like just saying that best practice is to provide it as a separate input file. This reinforces that view point of runtime attrs being special inputs instead of a pure first class thing. It also means not having to define a new file type.

However, that does imply codifying that there can be multiple input jsons and specifying things like resolution rules.

@illusional
Copy link
Contributor Author

illusional commented Apr 3, 2019

as long as we disallow runtime as a task or value name that should be fine

@cjllanwarne That's a great point, is there an easy way I can modify my PR to prohibit this, like potentially modifying the validator regex to not accept "runtime"?

Your multiple compose-able inputs is exactly what we're planning do when we run ours. We can specify essentially static files for specific environments (reference genome etc), one for our input files and then (if this proposal passes) one for our runtime for each environment.

@geoffjentry it seems that the spec doesn't do a great job of specifying the order of resolution rules (except maybe #variable resolution, hence why I added a section to mine (though I think I could clarify the wording). There is an order that has been proposed, specifically #141 (backed up by broadinstitute/cromwell#2446) but not implemented yet. I'm also happy to spin up another PR to address this if the community wants that.

illusional added a commit to PMCC-BioinformaticsCore/janis that referenced this pull request Apr 11, 2019
versions/development/SPEC.md Outdated Show resolved Hide resolved
@@ -2646,6 +2648,30 @@ In JSON, the inputs to the workflow in the previous section might be:

It's important to note that the type in JSON must be coercible to the WDL type. For example `wf.int_val` expects an integer, but if we specified it in JSON as `"wf.int_val": "three"`, this coercion from string to integer is not valid and would result in a coercion error. See the section on [Type Coercion](#type-coercion) for more details.

## Specifying / Overriding Runtime Attributes in JSON

Workflow runtime attributes may additionally be specified as key/value pairs within a JSON input file. It's recommended that this should be a separate inputs file called `runtime.json`. The mapping from JSON or YAML values to WDL values is codified similar to the serialization of task inputs section, however with an additional runtime tag to avoid name collisions and allow for unspecified attributes to be set by the runtime input file. Runtime attributes do not need to be specified in the task defintion to be overidden or set by the JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended that this should be a separate inputs file called runtime.json

Does the spec need to provide this information? The spec never specifies that workflow inputs must be specified using a file, or suggest a name for that file. The spec doesn't mention how inputs are actually provided, which seems correct as this is a WDL implementor decision. (The spec could be improved by specifically listing things that a WDL implementor must define as part of their implementation, but that is a different issue.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting ready to throw down a giant "you're wrong", but you're correct that the spec does not literally say this is a different file. However it's heavily implied and as one of the people who are responsible for all the less than correct things in the spec I can assure you that's what we meant.

I like how here it's a suggestion, we don't need to make a suggestion but IMO it's not a bad idea to encourage conformist behavior

Copy link

@multimeric multimeric Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however with an additional runtime tag

I don't think this is precise enough. I would say "nested inside the task.runtime object".

I agree with @pshapiro4broad that the spec shouldn't mention input files, since that's implementation specific. If we want to acknowledge input files in the spec it should be a separate PR. However, this PR might be able to say something like "we recommend that the executor supports providing runtime attributes separately from the workflow inputs, for instance as a separate input JSON file"

Copy link
Contributor Author

@illusional illusional Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove my statement in favour of @TMiguelT's:

We recommend that the executor supports providing runtime attributes separately from the workflow inputs, for instance as a separate inputs (maybe: runtime.json) JSON file.

I'm mixed, I'm very opinionated that you should specify your runtime (such as specific google cloud, or specific SFS) attributes separately to your inputs. I think it promotes a good practice, and will probably allow people to better share optimised workflows. The more I think about it, the less I believe the spec is the right place to put this, but I'm also not sure if there is another place.

Our use case is, we want to run the same workflow with varying inputs sizes on different backends. I can create specific optimised runtime jsons for our different targets and give them to other researchers or clinicians who don't need to know about the attributes we decided.

Copy link

@multimeric multimeric Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My wording does encourage the implementations to allow a separate runtime file, but it doesn't require it, because as you say it's not exactly something the spec can require

@cjllanwarne
Copy link
Contributor

I'd like to throw in one more "it would be really nice to tidy this up at the same time" -

I believe (but it may take some hunting down) that there's a line somewhere that says "you can't provide inputs directly to subworkflows or their tasks". Since we're saying that runtime are really just inputs, and since I would hope that runtime could be provided all the way down the call stack, it might be nice to also remove that line (and/or explicitly state somewhere that we can provide any inputs we choose as part of the input set)

@dheiman
Copy link

dheiman commented Apr 19, 2019

I haven't seen this mentioned yet, apologies if it has, but if runtime inputs are to become settable via json, they should not be used for call caching. This would eliminate so many headaches where memory is mapped to a regular input and needs to be tweaked for unforeseen reasons

@geoffjentry
Copy link
Member

@dheiman Just keep in mind that call caching isn't a WDL thing, it's a potential feature of implementations

@dheiman
Copy link

dheiman commented Apr 19, 2019

@geoffjentry fair enough, but isn't an implementation required before the PR can be merged? I'd be happy to move the comment over to the Cromwell PR/issue if this passes.

@geoffjentry
Copy link
Member

@dheiman Yeah, that's exactly what I was getting at. Also keep in mind that voting here can't reasonably be done with a disclaimer of "as long as my preferred implementation does X" since the Waiting for implementation phase happens post-voting.

}
```

As the runtime section consists of key/value pairs, it is the user's responsibility to ensure they provide the correct coercible type for the backend they are targeting. Expressions are disallowed within the `runtime.json`, hence no input will be coerced to an expression. See the section on [Type Coercion](#type-coercion) for more details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that expressions are not allowed, it's that they're not evaluated. For instance you can put in 1+1 as a runtime attribute value, it just won't end up as 2 downstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see how my initial wording is a little ambiguous, I've modified it (in 26537d5) to read:

An input value should not be coerced to an expression, hence values provided within the runtime.json should be evaluated.

I think it is clearer to read that what you provide shouldn't be evaluated (like if you provided "my test ${1+1} expression", it would remain), and the language better aligns with the rest of the spec.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you should say they shouldn't be evaluated. But right now your wording says "values provided within the runtime.json should be evaluated", which is incorrect.

Copy link
Contributor Author

@illusional illusional Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... maybe I should remove attention to detail on my resume (fixed).

maybe I shouldn't add attention to detail on my resume anymore
@patmagee
Copy link
Member

patmagee commented May 8, 2019

It seems as though this PR brings up a bit of a delicate issue that I think we need to resolve before we can consider voting on this.

Should the Spec define inputs and the specific input files and their Specific format, or should this be an implementation detail

There are actually benefits to both sides of this. being opinionated on the matter leads to greater conformance and reusability of workflows, at the potential cost of strong arming implements. Additionally by defining this right in the specification, open standards such as WES become easier to implement at a language level.

At the same time, requiring inputs in a specific format here (with support for specific files), could be beyond the capabilities of different implementors. Additionally, it might make it more harder for them to support the full feature set of WDL

What are everyone's thoughts?

@rhpvorderman
Copy link
Contributor

My thought is that I hate to use JSON for configuration since YAML is so much better for humans to edit. So I think the spec should define the input files, but not the format. (Unless it is my preferred format YAML of course 😉 ). But seriously, that bit should be an implementation detail. Who knows what useful formats come along in the future?
You would not want to be stuck on something like XML for configuration because "the spec says so". Any file that can be parsed to some sort of dictionary should be fine.

@geoffjentry
Copy link
Member

@patmagee My $0.02 is to treat them separately and I don't think a particular order is necssary. We can evaluate this change given the current spec (the one which does specify input format) and then if someone wants to remove that portion of the spec (or modify it) afterwards that can be a separate discussion afterwards. If that implies changes to this or anything else in the spec, it's on that proposer to make the appropriate edits.

IOW i think this has been hanging around long enough to judge it on its own merits and let complaints about inputs being codified be handled afterwards.

@illusional
Copy link
Contributor Author

@patmagee It's something I definitely didn't consider when spinning this PR up (hence the repeated use of JSON), but I wholeheartedly agree that the spec should not make an opinion on the format of inputs, eg no use of JSON, YAML, XML or others. Happy for whatever route we go, except that for tracking purposes it would be better to create a separate issues / PR to ensure that the community gets to vote on that.

I do want the spec to be opinionated about best practices for these specific input files, I think it promotes the wider goal of portability.


Side note, these two issues will need to be resolved before this gets implemented, or maybe even voted on. @geoffjentry Is it worth me spinning up a new issue to cover these, or put it in with this (as this feature would require pass through of inputs to subworkflows to work.

@illusional
Copy link
Contributor Author

Not to piggy back on this issue, but adding it as a note here so I don't forget. It would also be great if you could reference a runtime attribute inside the WDL command. For example, many tools require, or allow you to pass a threads attribute, where this is intrinsically tied to the runtime.

@geoffjentry
Copy link
Member

@illusional Sorry, I'm confused - why do the subworkflow PRs need to come first?

@illusional
Copy link
Contributor Author

@geoffjentry Otherwise you couldn't pass runtime attributes to tools under a subworkflow. I guess it's not directly a pre-req, but something I feel should be completed before this feature is useful.

@geoffjentry
Copy link
Member

Ok. Yeah I was wondering why it was a hard requirement and not a nice to have. Either way, this is your PR so if you want to table it until all of this other stuff is hashed out that's fine. As I said above I don't think these things all need to fit into a particular order, but that's up to you :)

@geoffjentry
Copy link
Member

Hi everyone. It occurred to me that it is a common pattern in Broad-authored WDLs (and thus the majority of WDLs out in the wild) to already do what @illusional is suggesting and wire their runtime attributes in as workflow inputs.

In effect, the WDL authors have already spoken and this is the model that they would like to see. This PR would clean up that behavior and make it less ad hoc.

The subworkflows issue would be nice to have but as @illusional noted it is not truly a pre-req. The inputs json thing, well, it's already in the spec. We should not be voting against one thing because we dont like another thing already in the spec (and IMO it not only should be in the spec, but enhanced).

@illusional - I'd like to move this forward. If you want to hold off let me know and I'll take this over.

@illusional
Copy link
Contributor Author

Thanks @geoffjentry, happy for it to move forward!

@geoffjentry
Copy link
Member

@illusional I realized it was going to be a pain to make commits to this as it's on your fork. Closing and reopening as #313

@multimeric
Copy link

Could you not have used the "allow edits from maintainers" option?

@illusional
Copy link
Contributor Author

It was off when I had a look, but it's not an option when you create a PR, just tucked away after you've created one. I turned it on now (not that it's any good).

@geoffjentry
Copy link
Member

I also didn't realize that was a thing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants