Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support Struct #9

Closed
3 tasks
mlin opened this issue Nov 2, 2018 · 13 comments
Closed
3 tasks

support Struct #9

mlin opened this issue Nov 2, 2018 · 13 comments
Labels
interop Bears on spec compatibility

Comments

@mlin
Copy link
Collaborator

mlin commented Nov 2, 2018

Remaining burndown for full support in miniwdl check:

  • object literal syntax
  • new literal syntax
  • Value

https://github.com/openwdl/wdl/blob/master/versions/1.0/SPEC.md#struct-definition
openwdl/wdl#280

  • parser: a type can now be essentially any CNAME. are we going to run into complications with recognition of keywords: command output runtime meta parameter_meta scatter if
  • Type: tree transformer needs to detect whether CNAME from parser is a builtin type or otherwise a presumed Struct, and initialize appropriate object. responsibility for checking that type parameters [] make sense will get pushed here as well
  • typechecking will need to refer to some document-level registry of declared & imported structs to go along with the type environment. Structs can contain other structs! (will need cycle detection...)
    • when we call an imported task/workflow which outputs a struct, we may have a different name for it than the callee. Then that struct might itself contain structs imported from some other document, which we may not have a name for at all, or there might be a name collision between one of those and one of ours.
    • so we better give each struct definition some globally unique id. it would be nice if this id were 'reproducible' across separate instantiations of the document from the same source...can use a counter assuming load/eval order is deterministic...or, Name<doc-crc32c>
  • Value.StructInstance (members dict) and JSON I/O
  • member access
    • when we have a struct instance s, we can reference s itself or a member s.m or s.m.n
    • does it need to be valid (if useless) to access a member of an immediate literal, Person(name="Calvin", age=6).age
    • how to use type environment vs code path in Expr.Ident
    • subsume pair .left/.right access into this?
    • Pair[Person,Person] p2 ... p2.left.age
    • Array[Person] ar ... ar[42].age
    • some name collisions between values, calls, imports will become "hard"...possibly this is already a problem with pair left/right access
    • general issue: the syntactic structure name1.name2.name3 should be parsed differently depending on the type environment. name1 could refer to a value and .name2.name3 struct member accesses within. Or name1.name2 could be the namespaced value, and name3 a struct member.
    • typechecker possibly should transform AST after these are sorted out. otherwise all downstream AST users have to handle that
  • literal / coercion (pending revision: Struct literal - Revist openwdl/wdl#286)
  • import & aliasing
    • " If two structs are named the same it will be necessary to resolve the conflicting names. "
    • "Its important to note, that when importing from file 2, all structs from file 2's global namespace will be imported. This Includes structs from another imported WDL within file 2, even if they are aliased. If a struct is aliased in file 2, it will be imported into file 1 under its aliased name."
    • pay attention to what type name is used in type mismatch errors on imported structs (especially call inputs or ident references to call outputs)
    • diamond imports
    • UnusedImport update, you might import a doc just to use its structs
  • NameCollision lint
@mlin mlin added the interop Bears on spec compatibility label Nov 2, 2018
@infispiel
Copy link

We would like to use miniwdl for checking our pipelines at the Broad Institute, but the lack of struct support is preventing us from using it on most of our workflows.

@mlin
Copy link
Collaborator Author

mlin commented Feb 12, 2019

Hi, cool, I was wondering are any of these workflows open-sourced at this point? In the integration tests we embed numerous public repositories of workflows to ensure they're always parsed successfully. It will be great to add more exercising these novel areas of the spec.

@infispiel
Copy link

At this current point in time, no. The workflows we're running miniwdl on are all private. We are planning to get a public version out to you soon to use for testing.

@mlin
Copy link
Collaborator Author

mlin commented Mar 5, 2019

@infispiel it might transpire pretty soon that miniwdl supports structs and the soon-to-be struct literal syntax without supporting the JSON-like Object literal syntax. I was wondering could you comment on what if any kind of burden this would cause you, i.e. how do you use struct literals now? Do you pass structs through the input JSON or are they just constructed in WDL from primitive inputs?

Basically since (purely due to timing) we've never supported Object in miniwdl, that construct is on its way out, and the open-source workflows we crawl don't use it, I've been hoping to get away with not implementing at all. But this is subject to what kind of compatibility problems arise of course.

@mlin
Copy link
Collaborator Author

mlin commented Mar 26, 2019

@infispiel ping...the struct literal syntax continues to flux a little bit, so I'd really value your guidance on how to stage this. I recently completed most of the internal plumbing work for structs but obviously the literals are a key bit to make it usable.

@pshapiro4broad
Copy link

pshapiro4broad commented Mar 27, 2019

The usages we have look like this.

Declarations:

struct SampleAndUnmappedBams {
  String base_file_name
  String final_gvcf_base_name
  Array[File] flowcell_unmapped_bams
  String sample_name
  String unmapped_bam_suffix
}

Creation:

  SampleAndUnmappedBams sample_and_unmapped_bams = object {
     sample_name: sample_name,
     base_file_name: base_file_name,
     flowcell_unmapped_bams: CramToUnmappedBams.unmapped_bams,
     final_gvcf_base_name: final_gvcf_base_name,
     unmapped_bam_suffix: unmapped_bam_suffix
  }

Which is what the current cromwell 1.0 WDL supports.

We can only use the syntax that cromwell currently supports, so supporting the next, as-yet-unimplemented, struct literal syntax in miniwdl wouldn't work for us.

@pshapiro4broad
Copy link

Regarding WDL 1 syntax vs the (newer) draft syntax, I think you'd want to support both. Even after the draft spec is approved and WDL 2 is out, there will still be existing WDL 1 workflows which miniwdl would want to support, since WDL 1 will continue to be supported by WDL runtimes.

@pshapiro4broad
Copy link

pshapiro4broad commented Apr 1, 2019

I didn't realize this before, but there are public versions of our WDL in github that can be used for testing miniwdl. The repo is https://github.com/gatk-workflows and an example of a workflow with structs is https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline

@mlin
Copy link
Collaborator Author

mlin commented Apr 2, 2019

@pshapiro4broad great stuff, thanks! It appears that in this particular version, the structs are all expected as JSON inputs, and there are no literal constructions in the WDL. Just an observation, it's coming along in any case.

@mlin
Copy link
Collaborator Author

mlin commented Apr 2, 2019

@pshapiro4broad also, a typo in one of the task WDLs makes the whole thing un-loadable :( filed this issue gatk-workflows/five-dollar-genome-analysis-pipeline#17 ...if you know the maintainer of that repo appreciate if you can bring it to their attention, thanks!

@pshapiro4broad
Copy link

Thanks for the bug report, looks like they're already on it.

@mlin
Copy link
Collaborator Author

mlin commented Apr 8, 2019

@pshapiro4broad @infispiel miniwdl v0.1.5 possibly might support structs to the extent you need as discussed above (including the object-style literals). Please file any issues you hit with it and/or we'll always welcome any additional public workflows to crawl. Thanks!

@mlin mlin closed this as completed Apr 8, 2019
@dinvlad
Copy link

dinvlad commented Jun 28, 2019

Thanks for the discussion today! We currently pass an opaque Object? context through workflow inputs. This object holds important "context" for the workflow run that's serialized (along with workflow outputs) into JSON at the very last step in the workflow, and triggers post-workflow activities.

Since there're no plans for Object to be recognized by MiniWDL, what would be a reasonable substitute in this case? I tried defining struct Context {} and Context? context, but then MiniWDL validation fails silently with an AssertionError. We wouldn't want to define the shape of Context {} inside WDL, because its shape is irrelevant for the purposes of the workflow (and would make it too coupled to the data model we use outside of the workflow).

EDIT: as a workaround, if I define struct Context { Boolean? test }, then MiniWDL validates successfully. So it looks like it has trouble validating empty structs (should I create a new issue for that?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop Bears on spec compatibility
Projects
None yet
Development

No branches or pull requests

4 participants