Skip to content

Commit

Permalink
Add quartonotebook module (nf-core#4876)
Browse files Browse the repository at this point in the history
* Add `main.nf` for quartonotebook

* Add environment files

* Add `meta.yml`

* Temporarily change `test_data_base` to for testing

* Add bare-bones nf-test test

* Abort when running with Conda profile on ARM64

* Add stub test; snapshot all outputs

* Add python, rmd and ipynb tests

* Add notebook parametrization

* Update Conda environment

* Add parametrization tests

* Add note about the container

* Add missing `papermill` mention in tools section

* Fix function names according to nf-core convention

* Add missing `${args}` to Quarto render command

* Change output to `${prefix}.html`

* Also allow PDF output; add PDF tests

* Fix nf-test configs

* Do not specify AMD64 for docker test profile

Do not specify using the AMD64 architecture for the docker test profile,
as this leads to problems with Pandoc (which Quarto uses) when emulating
AMD64 on ARM64 systems. The docker images for this module can be built
on both architectures, so always specifying one or the other is not
necessary.

* Revert "Also allow PDF output; add PDF tests"

This reverts commit 839cae0.

Getting PDF output to work turned out to be problematic due to issues
with (1) differences between TinyTeX installations on AMD64/ARM64
architectures; and (2) getting Pandoc to work properly inside the docker
containers. This might be solvable with a lot more work and
troubleshooting, but removing the PDF-functionality for now since HTML
is the output type expected to be the one desired by a vast majority of
the envisioned module audience; the related RMARKDOWNNOTEBOOK and
JUPYTERNOTEBOOK modules currently only support HTML output.

Another possible solution is to use the new `typst` typesetting system
introduced in Quarto 1.4 instead of *TeX, but this would preclude being
able to use Conda (which currently doesn't have Quarto 1.4).

* Update snapshot

* Disallow using the Conda/Mamba profile

Disallow using the Conda or Mamba profiles for the QUARTONOTEBOOK
module, as the environment created differs from that created with
containers. The Conda version of Quarto does not work on ARM64
architectures due to Pandoc-related issues, but installing outside Conda
works in a container-context. It is thus impossible to get the same
environment in a container image and using Conda, if compatibility with
both AMD64 and ARM64 architectures is desired (which it is). Hopefully
the issues with Conda will be solved in the future.

* Use `nf-core/test-datasets` for test data

* Move XDG variable definition to `main.nf`

* Add `extensions` input for Quarto templates

Add the `extension` module input that pipelines can use for Quarto
templates. This can be achieved e.g. by adding the `_extensions/`
directory with whatever extensions are desired into a pipeline's
`assets/` directory and creating a value channel like so:
`extensions = Channel.fromPath("[...]/_extensions").collect()`.

* Also output the original report

* Update snapshot

* Add note regarding disallowing the Conda profile
  • Loading branch information
fasterius authored and jennylsmith committed Mar 20, 2024
1 parent 331601d commit e7122a9
Show file tree
Hide file tree
Showing 12 changed files with 941 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,8 @@ jobs:
tags: merquryfk/merquryfk
- profile: conda
tags: merquryfk/ploidyplot
- profile: conda
tags: quartonotebook
- profile: conda
tags: sentieon/bwaindex
- profile: conda
Expand Down
38 changes: 38 additions & 0 deletions modules/nf-core/quartonotebook/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#
# First stage: Quarto installation
#
FROM ubuntu:20.04 as quarto
ARG QUARTO_VERSION=1.3.433
ARG TARGETARCH
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl \
&& apt-get clean

RUN mkdir -p /opt/quarto \
&& curl -o quarto.tar.gz -L "https://github.com/quarto-dev/quarto-cli/releases/download/v${QUARTO_VERSION}/quarto-${QUARTO_VERSION}-linux-${TARGETARCH}.tar.gz" \
&& tar -zxvf quarto.tar.gz -C /opt/quarto/ --strip-components=1 \
&& rm quarto.tar.gz

#
# Second stage: Conda environment
#
FROM condaforge/mambaforge:23.11.0-0
COPY --from=quarto /opt/quarto /opt/quarto
ENV PATH="${PATH}:/opt/quarto/bin"

# Install packages using Mamba; also remove static libraries, python bytecode
# files and javascript source maps that are not required for execution
COPY environment.yml ./
RUN mamba env update --name base --file environment.yml \
&& mamba clean --all --force-pkgs-dirs --yes \
&& find /opt/conda -follow -type f -name '*.a' -delete \
&& find /opt/conda -follow -type f -name '*.pyc' -delete \
&& find /opt/conda -follow -type f -name '*.js.map' -delete

CMD /bin/bash

LABEL \
authors = "Erik Fasterius" \
description = "Dockerfile for the quartonotebook nf-core module"
12 changes: 12 additions & 0 deletions modules/nf-core/quartonotebook/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: quartonotebook

channels:
- conda-forge
- bioconda
- defaults

dependencies:
- conda-forge::jupyter=1.0.0
- conda-forge::matplotlib=3.4.3
- conda-forge::papermill=2.4.0
- conda-forge::r-rmarkdown=2.25
107 changes: 107 additions & 0 deletions modules/nf-core/quartonotebook/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
include { dumpParamsYaml; indentCodeBlock } from "./parametrize"

process QUARTONOTEBOOK {
tag "$meta.id"
label 'process_low'

// NB: You'll likely want to override this with a container containing all
// required dependencies for your analyses. You'll at least need Quarto
// itself, Papermill and whatever language you are running your analyses on;
// you can see an example in this module's Dockerfile.
container "docker.io/erikfas/quartonotebook"

input:
tuple val(meta), path(notebook)
val parameters
path input_files
path extensions

output:
tuple val(meta), path("*.html") , emit: html
tuple val(meta), path("${notebook}"), emit: notebook
tuple val(meta), path("artifacts/*"), emit: artifacts, optional: true
tuple val(meta), path("params.yml") , emit: params_yaml, optional: true
tuple val(meta), path("_extensions"), emit: extensions, optional: true
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
// Exit if running this module with -profile conda / -profile mamba
// This is because of issues with getting a homogenous environment across
// both AMD64 and ARM64 architectures; please find more information at
// https://github.com/nf-core/modules/pull/4876#discussion_r1483541037.
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
exit 1, "The QUARTONOTEBOOK module does not support Conda/Mamba, please use Docker / Singularity / Podman instead."
}
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def parametrize = (task.ext.parametrize == null) ? true : task.ext.parametrize
def implicit_params = (task.ext.implicit_params == null) ? true : task.ext.implicit_params
def meta_params = (task.ext.meta_params == null) ? true : task.ext.meta_params

// Dump parameters to yaml file.
// Using a YAML file over using the CLI params because
// - No issue with escaping
// - Allows passing nested maps instead of just single values
// - Allows running with the language-agnostic `--execute-params`
def params_cmd = ""
def render_args = ""
if (parametrize) {
nb_params = [:]
if (implicit_params) {
nb_params["cpus"] = task.cpus
nb_params["artifact_dir"] = "artifacts"
nb_params["input_dir"] = "./"
}
if (meta_params) {
nb_params["meta"] = meta
}
nb_params += parameters
params_cmd = dumpParamsYaml(nb_params)
render_args = "--execute-params params.yml"
}
"""
# Dump .params.yml heredoc (section will be empty if parametrization is disabled)
${indentCodeBlock(params_cmd, 4)}
# Create output directory
mkdir artifacts
# Set environment variables needed for Quarto rendering
export XDG_CACHE_HOME="./.xdg_cache_home"
export XDG_DATA_HOME="./.xdg_data_home"
# Set parallelism for BLAS/MKL etc. to avoid over-booking of resources
export MKL_NUM_THREADS="$task.cpus"
export OPENBLAS_NUM_THREADS="$task.cpus"
export OMP_NUM_THREADS="$task.cpus"
export NUMBA_NUM_THREADS="$task.cpus"
# Render notebook
quarto render \\
${notebook} \\
${render_args} \\
${args} \\
--output ${prefix}.html
cat <<-END_VERSIONS > versions.yml
"${task.process}":
quarto: \$(quarto -v)
papermill: \$(papermill --version | cut -f1 -d' ')
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.html
cat <<-END_VERSIONS > versions.yml
"${task.process}":
quarto: \$(quarto -v)
END_VERSIONS
"""
}
83 changes: 83 additions & 0 deletions modules/nf-core/quartonotebook/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
name: "quartonotebook"
description: Render a Quarto notebook, including parametrization.
keywords:
- quarto
- notebook
- reports
- python
- r
tools:
- quartonotebook:
description: An open-source scientific and technical publishing system.
homepage: https://quarto.org/
documentation: https://quarto.org/docs/reference/
tool_dev_url: https://github.com/quarto-dev/quarto-cli
licence: ["MIT"]
- papermill:
description: Parameterize, execute, and analyze notebooks
homepage: https://github.com/nteract/papermill
documentation: http://papermill.readthedocs.io/en/latest/
tool_dev_url: https://github.com/nteract/papermill
licence: ["BSD 3-clause"]

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1', single_end:false ]`.
- notebook:
type: file
description: The Quarto notebook to be rendered.
pattern: "*.{qmd}"
- parameters:
type: map
description: |
Groovy map with notebook parameters which will be passed to Quarto to
generate parametrized reports.
- input_files:
type: file
description: One or multiple files serving as input data for the notebook.
pattern: "*"
- extensions:
type: file
description: |
A quarto `_extensions` directory with custom template(s) to be
available for rendering.
pattern: "*"

output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1', single_end:false ]`.
- html:
type: file
description: HTML report generated by Quarto.
pattern: "*.html"
- notebook:
type: file
description: The original, un-rendered notebook.
pattern: "*.[qmd,ipynb,rmd]"
- artifacts:
type: file
description: Artifacts generated during report rendering.
pattern: "*"
- params_yaml:
type: file
description: Parameters used during report rendering.
pattern: "*"
- extensions:
type: file
description: Quarto extensions used during report rendering.
pattern: "*"
- versions:
type: file
description: File containing software versions.
pattern: "versions.yml"

authors:
- "@fasterius"
maintainers:
- "@fasterius"
36 changes: 36 additions & 0 deletions modules/nf-core/quartonotebook/parametrize.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.DumperOptions


/**
* Multiline code blocks need to have the same indentation level
* as the `script:` section. This function re-indents code to the specified level.
*/
def indentCodeBlock(code, n_spaces) {
def indent_str = " ".multiply(n_spaces)
return code.stripIndent().split("\n").join("\n" + indent_str)
}

/**
* Create a config YAML file from a groovy map
*
* @params task The process' `task` variable
* @returns a line to be inserted in the bash script.
*/
def dumpParamsYaml(params) {
DumperOptions options = new DumperOptions();
options.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK);
def yaml = new Yaml(options)
def yaml_str = yaml.dump(params)

// Writing the params.yml file directly as follows does not work.
// It only works in 'exec:', but not if there is a `script:` section:
// task.workDir.resolve('params.yml').text = yaml_str

// Therefore, we inject it into the bash script:
return """\
cat <<"END_PARAMS_SECTION" > ./params.yml
${indentCodeBlock(yaml_str, 8)}
END_PARAMS_SECTION
"""
}
Loading

0 comments on commit e7122a9

Please sign in to comment.