Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a specification for unsatisfied task inputs. #359

Closed
wants to merge 18 commits into from
Closed
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 55 additions & 15 deletions versions/development/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -2600,12 +2600,35 @@ In this example, `i`, and `f` are inputs to this task even though `i` is not dir

## Computing Workflow Inputs

Workflows have inputs that must be satisfied to run them, just like tasks. Inputs to the workflow are provided as a key/value map where the key is of the form `workflow_name.input_name`.

* If a workflow is to be used as a sub-workflow it must ensure that all of the inputs to its calls are satisfied.
* If a workflow will only ever be submitted as a top-level workflow, it may optionally leave its tasks' inputs unsatisfied. This then forces the engine to additionally supply those inputs at run time. In this case, the inputs' names must be qualified in the inputs as `workflow_name.task_name.input_name`.

Any declaration that appears outside the `input` section is considered an intermediate value and **not** a workflow input. Any declaration can always be moved inside the `input` block to make it overridable.
Workflows have inputs that must be satisfied to run them, just like tasks.
Inputs to the workflow are provided as a key/value map where the key is of the
form `workflow_name.input_name`.

* A task has its inputs supplied when called by a workflow.
rhpvorderman marked this conversation as resolved.
Show resolved Hide resolved
* Example: `call my_task { input: my_task_input=... }`
* All required task inputs which do not have defaults should be filled by the
rhpvorderman marked this conversation as resolved.
Show resolved Hide resolved
calling workflow.
* A workflow is allowed not to specify optional inputs in a task's input block.
rhpvorderman marked this conversation as resolved.
Show resolved Hide resolved
In this case, the inputs bubble up to become an input to the workflow instead.
* Example: an unsupplied input might have the fully-qualified name
`my_workflow.my_task.my_task_input.`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Period should go outside the backtick, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this!

* If that workflow is used as a subworkflow, the input is allowed to bubble up
again with a further-qualified name.
* Example: my_outer_workflow.my_workflow.my_task.my_task_input.
* There is currently no way to supply a bubbled-up input in an outer workflow's
call block.
Copy link

@pieterlukasse pieterlukasse May 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this point. Does it contradict the previous one? I.e. the previous one seems to introduce an option, but this point seems to state it is not possible. Can you please clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first point is about the input json (or whatever other input format)

The outer workflow's call block is in the WDL code itself. When you call a workflow from inside a workflow. In that area there is currently no way to supply nested inputs.

* Example: one cannot say call my_workflow as subworkflow
`{ inputs: my_task.my_task_input=... }`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be made allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes maybe. But we should do that in another PR I think.

rhpvorderman marked this conversation as resolved.
Show resolved Hide resolved
* By default an engine only allows inputs that are specified in the input
section of the top-level workflow.
* An engine may optionally support supplying bubbled-up optional inputs, but
this has to be explicitly enabled on the engine (via configuration, command
line flags or otherwise).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel it might be better to add some form of syntax in the WDL file itself. That way one would be able to see that there may be interesting nested inputs in the WDL files and, thus, that it would be good to use an engine which supports them.

In addition, a workflow might be written without the intention for the bubble-up behavior to be used. If someone then uses such a flag or configuration to allow for the bubble-up, they might be able to set optional inputs which will mess up the workflow.

An alternative to the last two points in the list:

  • By default only inputs defined in the top-level workflow are made available to the user.
  • The keyword etc must be added to a workflow's input section to enable the bubble-up behavior described above. Only if this etc keyword is present in the workflow's input section may an engine allow for this bubble-up behavior and an engine may choose not to support to feature at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The etc keyword makes sense to me. However, why not just allow it in the workflow meta section? This way, we don't need new syntax.

Copy link
Contributor Author

@rhpvorderman rhpvorderman Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavyCats good points! @orodeh good suggestion! Setting it in the workflow seems indeed better than setting it in the engine, and using the meta section for this seems the most logical thing to do.

Something like:

meta {
    bubbled-up-inputs: true
}

Except that I don't like the bubbled-up word. It is in the PR, but more on "a lack of a better term" basis.

How about nested inputs?

meta {
     allow-nested-inputs: true
}

Or some other suggestions:

meta {
    allow-hidden-inputs: true
    allow-subworkflow-inputs: true
    allow-extended-inputs: true
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the meta section is the appropriate place for this. The meta section is just a set of key-value pairs that hold no relevance to the execution of a workflow. It is intended to be a place where you might store contact information or a link to the documentation. This was also discussed (briefly) in #351.

This feels more like something that would fit in the runtime section, except that there is no runtime section for workflows right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow-nested-inputs seems fine to me, though I wonder if using -s might cause issues with the grammars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @patmagee for the grammar question.

Well we could consider adding a runtime section for workflows, but that requires another spec/language change. (Which is why I wanted to set it engine-side) . Adding a runtime section could be useful, but there need to be other relevant things stored there. Having an entire runtime section for allow-nested-inputs seems a bit ludicrous to me.

I think the meta section is quite appropriate: It states that this workflow has nice nested inputs. A user does not have to use them. I think it is more documentation than runtime information, since an engine does not have to support nested-inputs per se.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually really great examples of something that can be added to the hints section, and we leave it up to the engine to decide what to do. We tried to really trim down the runtime lock and make it the bare minimum requirements needed to run a workflow.

We already have the idea of "semi reserved" kw used in the hints section. These are hints that have similar meanings across all engines but do not need to be implemented.

I'd suggest doing something like:

hints {
  allowNestedInputs: true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patmagee, yes, that seems to be the best section but in the PR for the hints section I only see it at the task level. Will there also be a hints section at the the workflow level?

Copy link
Contributor

@cjllanwarne cjllanwarne Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like it quite matches the semantics of a "hint" to me to be honest.

A hint to me should mean "if an engine completely ignores this, it might miss out on some runtime optimization but the logical evaluation of the workflow should still be possible".

What we're talking about here is:

  1. a warning to users that their workflow might not work on all engines.
  2. a requirement to supporting engines that they should allow bubbled up inputs.
  3. a signal for engines which don't support this feature to quickly stop processing the workflow/task

I'd almost like to add a way to flag plugins or non-standard-features or engine-requirements. I know Ohad has a number of other ideas for features which would be really cool but tricky for Cromwell. A way to flag these high up in the context of a WDL file (perhaps even above the workflow scope) would be really nice.

Copy link
Contributor

@cjllanwarne cjllanwarne Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eg something like:

version 2.0

plugins {
  bubbled-up-inputs,
  streamable-input-files,
  ???
}

it's not 100% ideal because it allows engines to diverge, but at least it'd be obvious right at the top of the file whether your workflow will work with any particular engine (which would presumably be able to publish or otherwise indicate which features they support). It would also allow engines to "get out quickly" if they see required features which they didn't have.


Any declaration that appears outside the `input` section is considered an
intermediate value and **not** a workflow input.
Any declaration can always be moved inside the `input` block to make it
overridable.

Consider the following workflow:

Expand All @@ -2617,26 +2640,32 @@ task t1 {
}

command {
./script --action=${s} -x${x}
./script --action=~{s} -x~{x}
}
output {
Int count = read_int(stdout())
}
runtime {
docker: "openwdl/examplescript:v1.1.3"
}
}

task t2 {
input {
String s
Int t
Int? t
Int x
}

command {
./script2 --action=${s} -x${x} --other=${t}
./script2 --action=~{s} -x~{x} ~{"--other=" + t}
}
output {
Int count = read_int(stdout())
}
runtime {
docker: "openwdl/examplescript2:v1.0.1"
}
}

task t3 {
Expand All @@ -2646,27 +2675,36 @@ task t3 {
}

command {
python -c "print(${y} + 1)"
python -c "print(~{y} + 1)"
}
output {
Int incr = read_int(stdout())
}
runtime {
docker: "python:3.7-slim"
}
}

workflow wf {
input {
Int int_val
Array[Int] my_ints
File ref_file
String t1s
String t2s
}

String not_an_input = "hello"

call t1 {
input: x = int_val
input:
x = int_val,
s = t1s
}
call t2 {
input: x = int_val, t=t1.count
input:
x = t1.count,
s = t2s
}
scatter(i in my_ints) {
call t3 {
Expand All @@ -2678,13 +2716,15 @@ workflow wf {

The inputs to `wf` would be:

* `wf.t1.s` as a `String`
* `wf.t2.s` as a `String`
* `wf.t1s` as a `String`
* `wf.t2s` as a `String`
* `wf.int_val` as an `Int`
* `wf.my_ints` as an `Array[Int]`
* `wf.ref_file` as a `File`

Note that because some call inputs are left unsatisfied, this workflow could not be used as a sub-workflow. To fix that, additional workflow inputs could be added to pass-through `t1.s` and `t2.s`.
Note that the optional `t` input for task `t2` is left unsatisfied, this
option could be passed as `wf.t2.t` if the engine has bubbled-up optional
inputs enabled.

## Specifying Workflow Inputs

Expand Down