Output schema documentation as Markdown #1381

krokicki · 2022-01-11T20:13:00Z

Per discussion in Slack and partially addressing #741, I added a nf-core schema docs command which takes any nextflow_schema.json and generates Markdown documentation for it.

PR checklist

This comment contains a description of changes (with reason)
CHANGELOG.md is updated
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Example Output (eager pipeline)

nf-core/eager pipeline parameters

A fully reproducible and state-of-the-art ancient DNA analysis pipeline

Input/output options

Define where the pipeline should find input data, and additional metadata.

Parameter	Description	Type	Default
`input`	Either paths or URLs to FASTQ/BAM data (must be surrounded with quotes). For paired end data, the path must use '{1,2}' notation to specify read pairs. Alternatively, a path to a TSV file (ending .tsv) containing file paths and sequencing/sample metadata. Allows for merging of multiple lanes/libraries/samples. Please see documentation for template.	`string`
`udg_type`	Specifies whether you have UDG treated libraries. Set to 'half' for partial treatment, or 'full' for UDG. If not set, libraries are assumed to have no UDG treatment ('none'). Not required for TSV input.	`string`	none
`single_stranded`	Specifies that libraries are single stranded. Always affects MALTExtract but will be ignored by pileupCaller with TSV input. Not required for TSV input.	`boolean`
`single_end`	Specifies that the input is single end reads. Not required for TSV input.	`boolean`
`colour_chemistry`	Specifies which Illumina sequencing chemistry was used. Used to inform whether to poly-G trim if turned on (see below). Not required for TSV input. Options: 2, 4.	`integer`	4
`bam`	Specifies that the input is in BAM format. Not required for TSV input.	`boolean`

codecov · 2022-01-11T20:20:01Z

Codecov Report

Merging #1381 (d313939) into dev (389729c) will increase coverage by 0.30%.
The diff coverage is 64.28%.

❗ Current head d313939 differs from pull request most recent head 051d134. Consider uploading reports for the commit 051d134 to get more accurate results

@@            Coverage Diff             @@
##              dev    #1381      +/-   ##
==========================================
+ Coverage   66.72%   67.02%   +0.30%     
==========================================
  Files          50       50              
  Lines        5692     5735      +43     
==========================================
+ Hits         3798     3844      +46     
+ Misses       1894     1891       -3

Impacted Files	Coverage Δ
nf_core/__main__.py	`54.82% <33.33%> (+2.09%)`	⬆️
nf_core/schema.py	`81.36% <81.48%> (+<0.01%)`	⬆️
nf_core/modules/lint/module_version.py	`80.76% <0.00%> (-3.85%)`	⬇️
nf_core/modules/lint/__init__.py	`80.82% <0.00%> (-1.37%)`	⬇️
nf_core/modules/create.py	`80.10% <0.00%> (-0.43%)`	⬇️
nf_core/modules/bump_versions.py	`67.04% <0.00%> (+0.37%)`	⬆️
nf_core/modules/module_utils.py	`39.44% <0.00%> (+0.92%)`	⬆️
nf_core/download.py	`55.44% <0.00%> (+0.96%)`	⬆️
nf_core/utils.py	`84.50% <0.00%> (+1.00%)`	⬆️
nf_core/modules/lint/main_nf.py	`81.57% <0.00%> (+1.46%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b32ed1...051d134. Read the comment docs.

ewels

Love it! Just tested locally and worked a treat.

Couple of minor requests:

Can we get the resulting markdown to pass markdownlint if possible? Or at least these two rules:
- Headings should be surrounded by blank lines
- There are some trailing spaces after the final column title: Hidden
Might be nice to use inline code backticks a bit more liberally around parameter names and types?

krokicki · 2022-01-11T20:35:45Z

Great suggestions, @ewels! The latest commit should fix those.

mashehu · 2022-01-11T20:40:36Z

Nice! 😍
Haven't tested it yet, but to make it closer to the the other command names, could we maybe use nf-core schema export?

ewels · 2022-01-11T22:16:50Z

Haven't tested it yet, but to make it closer to the the other command names, could we maybe use nf-core schema export?

But it's not really exporting the schema? It's generating prose documentation from it. So I quite like the current command personally..

drpatelh · 2022-01-11T22:25:59Z

I guess nf-core schema docs could also be read as generating docs for the schema and not from it. I prefer nf-core schema export because it's quite a common operation performed on other file types/formats when converting from one type to another.

ewels

Great! Tried again and working really nicely now.

Two remaining things I've noticed on this round:

The current code will ignore any parameters not within a definition.
- For nf-core pipelines, everything should be here as we lint for it (I think?). But ungrouped parameters from nf-core schema build will just be in the top level object and need to be handled separately.
- You can see this logic in various bits of code that work with the schema, for example the website docs code.
We're missing some schema fields
- Most notable is help_text. I appreciate that this is difficult to render in the markdown. Maybe there could be a section at the bottom of the document with all of the help text blocks and the table parameter name could link to the correct heading?
- Also missing some of the more specialised fields, such as pattern, enum, min, max. Again might make the table too verbose, so not sure what the best thing is to do about these.

Minor things I wondered:

Would be nice if we could specify the name / path to a pipeline as well as / instead of a path to a schema file. eg. nf-core schema docs rnaseq, nf-core schema docs nf-core/rnaseq and so on. Some other commands work in this way I think so can hopefully reuse that code.
Might be nice to have a built-in functionality to output HTML snippets as well. It's pretty simple, see for example the pipeline markdown_to_html.py scripts. Would then need an additional flag for --format (which can default to markdown).

The direction I was thinking of going in was to have the option to build HTML outputs using a Jinja2 template (with a default supplied or a user-supplied one optionally). Then we could get quite fancy with styling the output and essentially replicate a white-label version of the nf-core website if we want. But this starts to get substantially more complicated and may not be worth the effort. So I'm certainly happy with merge without any of that, can always add it at a later date.

mashehu · 2022-01-12T09:42:25Z

To make it easier to work out of the box and make it again a bit more similar to the other nf-core commands, could we make nextflow_schema.json as the default value <pipeline schema>? If somebody would like a different schema they can still specify it, but this is the main schema file name for all nf-core pipelines.

For the help_text field: Maybe adding it inside in a details tag, so it is collapsed on default would be an option? Eager is my go-to pipeline with a lot of markdown formatting inside the help text, to test the markdown inside markdown rendering.

Coming back to nf-core schema export: I like that this would make it quite verbal if we have more output formats, e.g. nf-core schema export --markdown nf-core schema export --html (plus I still find it too inconsistent to have a noun for a sub-command when all the other sub-commands for schema and modules are verbs (or verb_noun)).

krokicki · 2022-01-12T22:17:47Z

@ewels @mashehu Those suggestions sound good but I'm probably not the right person to attempt them right now, given that I haven't actually worked with nf-core pipelines before and I'm not familiar with any of those special constructs.

I'll circle back to this after I have more experience, if no one else has picked it up -- I'm going to attempt to convert some of our Nextflow image analysis pipelines to nf-core at some point soon.

ewels · 2022-01-21T10:07:59Z

Just made a start on some of the simpler bits here:

Minor code refactoring
Added check for existing file, plus --force flag
Renamed --markdown to more generic --output
Added option to output HTML as well as markdown, new --format flag

I haven't attempted any of the more difficult / important bits yet 😅 I will do if I get time.

ewels · 2022-03-11T11:11:05Z

Ok, I think that this is good to merge now. Just added a couple more bits:

Render ungrouped parameters
Use nextflow_schema.json as a default file path (not as good as taking eg. nf-core/rnaseq but it'll do for now).

There's certainly more that we can do with this, but for now I think it's fine to merge it in and let people play with it. Can add new features to it if there's demand..

Many thanks @krokicki!

Phil

ewels · 2022-03-11T11:11:24Z

Failing tests are unrelated to this PR, see #1436 👉🏻 merging anyway.

krokicki added 4 commits January 11, 2022 15:02

implemented Markdown-formatted parameter doc generation

d716aad

added unit test for schema docs command

b8e4be6

updated changelog with new schema docs command

926e579

code formatted with black

4d37b26

ewels requested changes Jan 11, 2022

View reviewed changes

modifications to pass markdownlint and improve formatting of schema docs

d313939

ewels requested changes Jan 11, 2022

View reviewed changes

ewels added 3 commits January 21, 2022 10:31

Minor code refactor, add '--force'

6c9dd05

Remove unused try/except

5aea442

Add option to generate HTML instead of markdown

1f28973

ewels added 4 commits January 21, 2022 11:13

Add help_text in a <details> tag

4c29eda

Merge branch 'dev' into dev

b8176b7

Schema docs: default to nextflow_schema.json if path not provided.

83a1d72

Schema docs: Support ungrouped parameters

051d134

ewels merged commit ece5cbf into nf-core:dev Mar 11, 2022

krokicki deleted the dev branch March 23, 2022 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output schema documentation as Markdown #1381

Output schema documentation as Markdown #1381

krokicki commented Jan 11, 2022 •

edited

Loading

codecov bot commented Jan 11, 2022 •

edited

Loading

ewels left a comment

krokicki commented Jan 11, 2022

mashehu commented Jan 11, 2022

ewels commented Jan 11, 2022

drpatelh commented Jan 11, 2022 •

edited

Loading

ewels left a comment

mashehu commented Jan 12, 2022

krokicki commented Jan 12, 2022

ewels commented Jan 21, 2022

ewels commented Mar 11, 2022

ewels commented Mar 11, 2022

Output schema documentation as Markdown #1381

Output schema documentation as Markdown #1381

Conversation

krokicki commented Jan 11, 2022 • edited Loading

PR checklist

Example Output (eager pipeline)

nf-core/eager pipeline parameters

Input/output options

codecov bot commented Jan 11, 2022 • edited Loading

Codecov Report

ewels left a comment

Choose a reason for hiding this comment

krokicki commented Jan 11, 2022

mashehu commented Jan 11, 2022

ewels commented Jan 11, 2022

drpatelh commented Jan 11, 2022 • edited Loading

ewels left a comment

Choose a reason for hiding this comment

mashehu commented Jan 12, 2022

krokicki commented Jan 12, 2022

ewels commented Jan 21, 2022

ewels commented Mar 11, 2022

ewels commented Mar 11, 2022

krokicki commented Jan 11, 2022 •

edited

Loading

codecov bot commented Jan 11, 2022 •

edited

Loading

drpatelh commented Jan 11, 2022 •

edited

Loading