forked from nf-core/modules
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add quartonotebook module (nf-core#4876)
* Add `main.nf` for quartonotebook * Add environment files * Add `meta.yml` * Temporarily change `test_data_base` to for testing * Add bare-bones nf-test test * Abort when running with Conda profile on ARM64 * Add stub test; snapshot all outputs * Add python, rmd and ipynb tests * Add notebook parametrization * Update Conda environment * Add parametrization tests * Add note about the container * Add missing `papermill` mention in tools section * Fix function names according to nf-core convention * Add missing `${args}` to Quarto render command * Change output to `${prefix}.html` * Also allow PDF output; add PDF tests * Fix nf-test configs * Do not specify AMD64 for docker test profile Do not specify using the AMD64 architecture for the docker test profile, as this leads to problems with Pandoc (which Quarto uses) when emulating AMD64 on ARM64 systems. The docker images for this module can be built on both architectures, so always specifying one or the other is not necessary. * Revert "Also allow PDF output; add PDF tests" This reverts commit 839cae0. Getting PDF output to work turned out to be problematic due to issues with (1) differences between TinyTeX installations on AMD64/ARM64 architectures; and (2) getting Pandoc to work properly inside the docker containers. This might be solvable with a lot more work and troubleshooting, but removing the PDF-functionality for now since HTML is the output type expected to be the one desired by a vast majority of the envisioned module audience; the related RMARKDOWNNOTEBOOK and JUPYTERNOTEBOOK modules currently only support HTML output. Another possible solution is to use the new `typst` typesetting system introduced in Quarto 1.4 instead of *TeX, but this would preclude being able to use Conda (which currently doesn't have Quarto 1.4). * Update snapshot * Disallow using the Conda/Mamba profile Disallow using the Conda or Mamba profiles for the QUARTONOTEBOOK module, as the environment created differs from that created with containers. The Conda version of Quarto does not work on ARM64 architectures due to Pandoc-related issues, but installing outside Conda works in a container-context. It is thus impossible to get the same environment in a container image and using Conda, if compatibility with both AMD64 and ARM64 architectures is desired (which it is). Hopefully the issues with Conda will be solved in the future. * Use `nf-core/test-datasets` for test data * Move XDG variable definition to `main.nf` * Add `extensions` input for Quarto templates Add the `extension` module input that pipelines can use for Quarto templates. This can be achieved e.g. by adding the `_extensions/` directory with whatever extensions are desired into a pipeline's `assets/` directory and creating a value channel like so: `extensions = Channel.fromPath("[...]/_extensions").collect()`. * Also output the original report * Update snapshot * Add note regarding disallowing the Conda profile
- Loading branch information
1 parent
331601d
commit e7122a9
Showing
12 changed files
with
941 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# | ||
# First stage: Quarto installation | ||
# | ||
FROM ubuntu:20.04 as quarto | ||
ARG QUARTO_VERSION=1.3.433 | ||
ARG TARGETARCH | ||
RUN apt-get update \ | ||
&& apt-get install -y --no-install-recommends \ | ||
ca-certificates \ | ||
curl \ | ||
&& apt-get clean | ||
|
||
RUN mkdir -p /opt/quarto \ | ||
&& curl -o quarto.tar.gz -L "https://github.com/quarto-dev/quarto-cli/releases/download/v${QUARTO_VERSION}/quarto-${QUARTO_VERSION}-linux-${TARGETARCH}.tar.gz" \ | ||
&& tar -zxvf quarto.tar.gz -C /opt/quarto/ --strip-components=1 \ | ||
&& rm quarto.tar.gz | ||
|
||
# | ||
# Second stage: Conda environment | ||
# | ||
FROM condaforge/mambaforge:23.11.0-0 | ||
COPY --from=quarto /opt/quarto /opt/quarto | ||
ENV PATH="${PATH}:/opt/quarto/bin" | ||
|
||
# Install packages using Mamba; also remove static libraries, python bytecode | ||
# files and javascript source maps that are not required for execution | ||
COPY environment.yml ./ | ||
RUN mamba env update --name base --file environment.yml \ | ||
&& mamba clean --all --force-pkgs-dirs --yes \ | ||
&& find /opt/conda -follow -type f -name '*.a' -delete \ | ||
&& find /opt/conda -follow -type f -name '*.pyc' -delete \ | ||
&& find /opt/conda -follow -type f -name '*.js.map' -delete | ||
|
||
CMD /bin/bash | ||
|
||
LABEL \ | ||
authors = "Erik Fasterius" \ | ||
description = "Dockerfile for the quartonotebook nf-core module" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
name: quartonotebook | ||
|
||
channels: | ||
- conda-forge | ||
- bioconda | ||
- defaults | ||
|
||
dependencies: | ||
- conda-forge::jupyter=1.0.0 | ||
- conda-forge::matplotlib=3.4.3 | ||
- conda-forge::papermill=2.4.0 | ||
- conda-forge::r-rmarkdown=2.25 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
include { dumpParamsYaml; indentCodeBlock } from "./parametrize" | ||
|
||
process QUARTONOTEBOOK { | ||
tag "$meta.id" | ||
label 'process_low' | ||
|
||
// NB: You'll likely want to override this with a container containing all | ||
// required dependencies for your analyses. You'll at least need Quarto | ||
// itself, Papermill and whatever language you are running your analyses on; | ||
// you can see an example in this module's Dockerfile. | ||
container "docker.io/erikfas/quartonotebook" | ||
|
||
input: | ||
tuple val(meta), path(notebook) | ||
val parameters | ||
path input_files | ||
path extensions | ||
|
||
output: | ||
tuple val(meta), path("*.html") , emit: html | ||
tuple val(meta), path("${notebook}"), emit: notebook | ||
tuple val(meta), path("artifacts/*"), emit: artifacts, optional: true | ||
tuple val(meta), path("params.yml") , emit: params_yaml, optional: true | ||
tuple val(meta), path("_extensions"), emit: extensions, optional: true | ||
path "versions.yml" , emit: versions | ||
|
||
when: | ||
task.ext.when == null || task.ext.when | ||
|
||
script: | ||
// Exit if running this module with -profile conda / -profile mamba | ||
// This is because of issues with getting a homogenous environment across | ||
// both AMD64 and ARM64 architectures; please find more information at | ||
// https://github.com/nf-core/modules/pull/4876#discussion_r1483541037. | ||
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||
exit 1, "The QUARTONOTEBOOK module does not support Conda/Mamba, please use Docker / Singularity / Podman instead." | ||
} | ||
def args = task.ext.args ?: '' | ||
def prefix = task.ext.prefix ?: "${meta.id}" | ||
def parametrize = (task.ext.parametrize == null) ? true : task.ext.parametrize | ||
def implicit_params = (task.ext.implicit_params == null) ? true : task.ext.implicit_params | ||
def meta_params = (task.ext.meta_params == null) ? true : task.ext.meta_params | ||
|
||
// Dump parameters to yaml file. | ||
// Using a YAML file over using the CLI params because | ||
// - No issue with escaping | ||
// - Allows passing nested maps instead of just single values | ||
// - Allows running with the language-agnostic `--execute-params` | ||
def params_cmd = "" | ||
def render_args = "" | ||
if (parametrize) { | ||
nb_params = [:] | ||
if (implicit_params) { | ||
nb_params["cpus"] = task.cpus | ||
nb_params["artifact_dir"] = "artifacts" | ||
nb_params["input_dir"] = "./" | ||
} | ||
if (meta_params) { | ||
nb_params["meta"] = meta | ||
} | ||
nb_params += parameters | ||
params_cmd = dumpParamsYaml(nb_params) | ||
render_args = "--execute-params params.yml" | ||
} | ||
""" | ||
# Dump .params.yml heredoc (section will be empty if parametrization is disabled) | ||
${indentCodeBlock(params_cmd, 4)} | ||
# Create output directory | ||
mkdir artifacts | ||
# Set environment variables needed for Quarto rendering | ||
export XDG_CACHE_HOME="./.xdg_cache_home" | ||
export XDG_DATA_HOME="./.xdg_data_home" | ||
# Set parallelism for BLAS/MKL etc. to avoid over-booking of resources | ||
export MKL_NUM_THREADS="$task.cpus" | ||
export OPENBLAS_NUM_THREADS="$task.cpus" | ||
export OMP_NUM_THREADS="$task.cpus" | ||
export NUMBA_NUM_THREADS="$task.cpus" | ||
# Render notebook | ||
quarto render \\ | ||
${notebook} \\ | ||
${render_args} \\ | ||
${args} \\ | ||
--output ${prefix}.html | ||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
quarto: \$(quarto -v) | ||
papermill: \$(papermill --version | cut -f1 -d' ') | ||
END_VERSIONS | ||
""" | ||
|
||
stub: | ||
def args = task.ext.args ?: '' | ||
def prefix = task.ext.prefix ?: "${meta.id}" | ||
""" | ||
touch ${prefix}.html | ||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
quarto: \$(quarto -v) | ||
END_VERSIONS | ||
""" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
name: "quartonotebook" | ||
description: Render a Quarto notebook, including parametrization. | ||
keywords: | ||
- quarto | ||
- notebook | ||
- reports | ||
- python | ||
- r | ||
tools: | ||
- quartonotebook: | ||
description: An open-source scientific and technical publishing system. | ||
homepage: https://quarto.org/ | ||
documentation: https://quarto.org/docs/reference/ | ||
tool_dev_url: https://github.com/quarto-dev/quarto-cli | ||
licence: ["MIT"] | ||
- papermill: | ||
description: Parameterize, execute, and analyze notebooks | ||
homepage: https://github.com/nteract/papermill | ||
documentation: http://papermill.readthedocs.io/en/latest/ | ||
tool_dev_url: https://github.com/nteract/papermill | ||
licence: ["BSD 3-clause"] | ||
|
||
input: | ||
- meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. `[ id:'sample1', single_end:false ]`. | ||
- notebook: | ||
type: file | ||
description: The Quarto notebook to be rendered. | ||
pattern: "*.{qmd}" | ||
- parameters: | ||
type: map | ||
description: | | ||
Groovy map with notebook parameters which will be passed to Quarto to | ||
generate parametrized reports. | ||
- input_files: | ||
type: file | ||
description: One or multiple files serving as input data for the notebook. | ||
pattern: "*" | ||
- extensions: | ||
type: file | ||
description: | | ||
A quarto `_extensions` directory with custom template(s) to be | ||
available for rendering. | ||
pattern: "*" | ||
|
||
output: | ||
- meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. `[ id:'sample1', single_end:false ]`. | ||
- html: | ||
type: file | ||
description: HTML report generated by Quarto. | ||
pattern: "*.html" | ||
- notebook: | ||
type: file | ||
description: The original, un-rendered notebook. | ||
pattern: "*.[qmd,ipynb,rmd]" | ||
- artifacts: | ||
type: file | ||
description: Artifacts generated during report rendering. | ||
pattern: "*" | ||
- params_yaml: | ||
type: file | ||
description: Parameters used during report rendering. | ||
pattern: "*" | ||
- extensions: | ||
type: file | ||
description: Quarto extensions used during report rendering. | ||
pattern: "*" | ||
- versions: | ||
type: file | ||
description: File containing software versions. | ||
pattern: "versions.yml" | ||
|
||
authors: | ||
- "@fasterius" | ||
maintainers: | ||
- "@fasterius" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import org.yaml.snakeyaml.Yaml | ||
import org.yaml.snakeyaml.DumperOptions | ||
|
||
|
||
/** | ||
* Multiline code blocks need to have the same indentation level | ||
* as the `script:` section. This function re-indents code to the specified level. | ||
*/ | ||
def indentCodeBlock(code, n_spaces) { | ||
def indent_str = " ".multiply(n_spaces) | ||
return code.stripIndent().split("\n").join("\n" + indent_str) | ||
} | ||
|
||
/** | ||
* Create a config YAML file from a groovy map | ||
* | ||
* @params task The process' `task` variable | ||
* @returns a line to be inserted in the bash script. | ||
*/ | ||
def dumpParamsYaml(params) { | ||
DumperOptions options = new DumperOptions(); | ||
options.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK); | ||
def yaml = new Yaml(options) | ||
def yaml_str = yaml.dump(params) | ||
|
||
// Writing the params.yml file directly as follows does not work. | ||
// It only works in 'exec:', but not if there is a `script:` section: | ||
// task.workDir.resolve('params.yml').text = yaml_str | ||
|
||
// Therefore, we inject it into the bash script: | ||
return """\ | ||
cat <<"END_PARAMS_SECTION" > ./params.yml | ||
${indentCodeBlock(yaml_str, 8)} | ||
END_PARAMS_SECTION | ||
""" | ||
} |
Oops, something went wrong.