Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download: Get DSL2 singularity containers #832

Merged
merged 33 commits into from
Feb 8, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
70215b1
download: Scrape DSL2 style container addresses.
ewels Jan 20, 2021
36dbda7
Fix download filenames, bugtesting
ewels Jan 20, 2021
35317cf
Code refactor into smaller functions.
ewels Jan 21, 2021
616d913
Added overall progress bar for container downloads.
ewels Jan 21, 2021
1a7795f
Download progress bar tweaks
ewels Jan 21, 2021
5192e83
Download: fix tests
ewels Jan 21, 2021
4251df7
Download: image filenames cleaned à la Nextflow
ewels Jan 21, 2021
de071fd
Tidy up some logging
ewels Jan 21, 2021
d306bf2
Download: Add --force flag
ewels Jan 21, 2021
9610ef8
Few more log statements, simplify progress bar code a little
ewels Jan 22, 2021
f1de9ce
Download: Code refactor again.
ewels Jan 25, 2021
78d5b0d
Download singularity images in parallel
ewels Jan 25, 2021
b5c6ddc
Download: Use '.partial' extensions whilst downloading.
ewels Jan 25, 2021
625db05
Download: Fix tests
ewels Jan 25, 2021
7dd36f0
Download: Make ctrl-c work with multithreading
ewels Jan 31, 2021
a9e9258
Revert wait to as_completed
ewels Jan 31, 2021
ad4bc6b
Download: New --use_singularity_cache option
ewels Jan 31, 2021
aae3388
Download: add 'singularity.cacheDir' to workflow config for image paths.
ewels Feb 1, 2021
e469b94
Fix error with singularity_pull_image()
ewels Feb 1, 2021
d0e7cbc
Make singularity image directories if they don't already exist instea…
ewels Feb 1, 2021
04362e3
- not _ in cli flags, makedirs bugfix
ewels Feb 1, 2021
4a95a5b
Tidy up download cli options a little
ewels Feb 1, 2021
5bd1766
Download: Nicer logging for exceptions in download threads
ewels Feb 1, 2021
acdf5ba
Bugfix: Creating missing directories moved upstream
ewels Feb 1, 2021
c5173d4
Download: New docs and changelog
ewels Feb 1, 2021
503d75b
Changelog - link to PR
ewels Feb 1, 2021
6e943dc
Try to simplify exception handling for threaded downloads again
ewels Feb 1, 2021
154822b
Revert the download exception handling stuff as it was working before…
ewels Feb 1, 2021
0316fa0
Better verbose debug logging
ewels Feb 1, 2021
3e719a1
Minor refactor
ewels Feb 1, 2021
e28efa4
Download: Make logging for singularity-pull as pretty as for downloads
ewels Feb 1, 2021
62a37a7
fix tests
ewels Feb 1, 2021
e4a397e
Use Popen arg name for Python 3.6
ewels Feb 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
### Tools helper code

* Fixed some bugs in the command line interface for `nf-core launch` and improved formatting [[#829](https://github.com/nf-core/tools/pull/829)]
* New functionality for `nf-core download` to make it compatible with DSL2 pipelines [[#832](https://github.com/nf-core/tools/pull/832)]
* Singularity images in module files are now discovered and fetched
* Direct downloads of Singularity images in python allowed (much faster than running `singularity pull`)
* Downloads now work with `$NXF_SINGULARITY_CACHEDIR` so that pipelines sharing containers have efficient downloads

### Linting

Expand Down
78 changes: 58 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,9 +276,11 @@ Do you want to run this command now? [y/N]: n

## Downloading pipelines for offline use

Sometimes you may need to run an nf-core pipeline on a server or HPC system that has no internet connection. In this case you will need to fetch the pipeline files first, then manually transfer them to your system.
Sometimes you may need to run an nf-core pipeline on a server or HPC system that has no internet connection.
In this case you will need to fetch the pipeline files first, then manually transfer them to your system.

To make this process easier and ensure accurate retrieval of correctly versioned code and software containers, we have written a download helper tool. Simply specify the name of the nf-core pipeline and it will be downloaded to your current working directory.
To make this process easier and ensure accurate retrieval of correctly versioned code and software containers, we have written a download helper tool.
Simply specify the name of the nf-core pipeline and it will be downloaded to your current working directory.

By default, the pipeline will download the pipeline code and the [institutional nf-core/configs](https://github.com/nf-core/configs) files.
If you specify the flag `--singularity`, it will also download any singularity image files that are required.
Expand All @@ -297,9 +299,9 @@ $ nf-core download methylseq -r 1.4 --singularity
nf-core/tools version 1.10

INFO Saving methylseq
Pipeline release: 1.4
Pull singularity containers: No
Output file: nf-core-methylseq-1.4.tar.gz
Pipeline release: '1.4'
Pull singularity containers: 'No'
Output file: 'nf-core-methylseq-1.4.tar.gz'
INFO Downloading workflow files from GitHub
INFO Downloading centralised configs from GitHub
INFO Compressing download..
Expand All @@ -311,7 +313,7 @@ The tool automatically compresses all of the resulting file in to a `.tar.gz` ar
You can choose other formats (`.tar.bz2`, `zip`) or to not compress (`none`) with the `-c`/`--compress` flag.
The console output provides the command you need to extract the files.

Once uncompressed, you will see the following file structure for the downloaded pipeline:
Once uncompressed, you will see something like the following file structure for the downloaded pipeline:

```console
$ tree -L 2 nf-core-methylseq-1.4/
Expand All @@ -326,8 +328,6 @@ nf-core-methylseq-1.4
│   ├── nextflow.config
│   ├── nfcore_custom.config
│   └── README.md
├── singularity-images
│   └── nf-core-methylseq-1.4.simg
└── workflow
├── assets
├── bin
Expand All @@ -342,25 +342,63 @@ nf-core-methylseq-1.4
├── nextflow.config
├── nextflow_schema.json
└── README.md

10 directories, 15 files
```

The pipeline files are automatically updated so that the local copy of institutional configs are available when running the pipeline.
The pipeline files are automatically updated (`params.custom_config_base` is set to `../configs`), so that the local copy of institutional configs are available when running the pipeline.
So using `-profile <NAME>` should work if available within [nf-core/configs](https://github.com/nf-core/configs).

You can run the pipeline by simply providing the directory path for the `workflow` folder.
Note that if using Singularity, you will also need to provide the path to the Singularity image.
For example:
You can run the pipeline by simply providing the directory path for the `workflow` folder to your `nextflow run` command.

```bash
nextflow run /path/to/nf-core-methylseq-1.4/workflow/ \
-profile singularity \
-with-singularity /path/to/nf-core-methylseq-1.4/singularity-images/nf-core-methylseq-1.4.simg \
# .. other normal pipeline parameters from here on..
--input '*_R{1,2}.fastq.gz' --genome GRCh38
By default, the download will not run if a target directory or archive already exists. Use the `--force` flag to overwrite / delete any existing download files _(not including those in the Singularity cache directory, see below)_.

### Downloading singularity containers

If you're using Singularity, the `nf-core download` command can also fetch the required Singularity container images for you.
To do this, specify the `--singularity` option.
Your archive / target output directory will then include three folders: `workflow`, `configs` and also `singularity-containers`.

The downloaded workflow files are again edited to add the following line to the end of the pipeline's `nextflow.config` file:

```nextflow
singularity.cacheDir = "${projectDir}/../singularity-images/"
```

This tells Nextflow to use the `singularity-containers` directory relative to the workflow for the singularity image cache directory.
All images should be downloaded there, so Nextflow will use them instead of trying to pull from the internet.

### Singularity cache directory

We highly recommend setting the `$NXF_SINGULARITY_CACHEDIR` environment variable on your system, even if that is a different system to where you will be running Nextflow.

If found, the tool will fetch the Singularity images to this directory first before copying to the target output archive / directory.
Any images previously fetched will be found there and copied directly - this includes images that may be shared with other pipelines or previous pipeline version downloads or download attempts.

If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose specify `--singularity-cache`.
This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory.
The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly.

### How the Singularity image downloads work

The Singularity image download finds containers using two methods:

1. It runs `nextflow config` on the downloaded workflow to look for a `process.container` statement for the whole pipeline.
This is the typical method used for DSL1 pipelines.
2. It scrapes any files it finds with a `.nf` file extension in the workflow `modules` directory for lines
that look like `container = "xxx"`. This is the typical method for DSL2 pipelines, which have one container per process.

Some DSL2 modules have container addresses for docker (eg. `quay.io/biocontainers/fastqc:0.11.9--0`) and also URLs for direct downloads of a Singularity continaer (eg. `https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0`).
Where both are found, the download URL is preferred.

Once a full list of containers is found, they are processed in the following order:

1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache` specified)
2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache` is _not_ specified, they are copied to the output directory
3. If they start with `http` they are downloaded directly within Python (default 4 at a time, you can customise this with `--parallel-downloads`)
4. If they look like a Docker image name, they are fetched using a `singularity pull` command
* This requires Singularity to be installed on the system and is substantially slower

Note that compressing many GBs of binary files can be slow, so specifying `--compress none` is recommended when downloading Singularity images.

## Pipeline software licences

Sometimes it's useful to see the software licences of the tools used in a pipeline. You can use the `licences` subcommand to fetch and print the software licence from each conda / PyPI package used in an nf-core pipeline.
Expand Down
25 changes: 18 additions & 7 deletions nf_core/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,23 +202,34 @@ def launch(pipeline, id, revision, command_only, params_in, params_out, save_all
@nf_core_cli.command(help_priority=3)
@click.argument("pipeline", required=True, metavar="<pipeline name>")
@click.option("-r", "--release", type=str, help="Pipeline release")
@click.option("-s", "--singularity", is_flag=True, default=False, help="Download singularity containers")
@click.option("-o", "--outdir", type=str, help="Output directory")
@click.option(
"-c",
"--compress",
type=click.Choice(["tar.gz", "tar.bz2", "zip", "none"]),
default="tar.gz",
help="Compression type",
help="Archive compression type",
)
def download(pipeline, release, singularity, outdir, compress):
@click.option("-f", "--force", is_flag=True, default=False, help="Overwrite existing files")
@click.option("-s", "--singularity", is_flag=True, default=False, help="Download singularity images")
@click.option(
"-c",
"--singularity-cache",
is_flag=True,
default=False,
help="Don't copy images to the output directory, don't set 'singularity.cacheDir' in workflow",
)
@click.option("-p", "--parallel-downloads", type=int, default=4, help="Number of parallel image downloads")
def download(pipeline, release, outdir, compress, force, singularity, singularity_cache, parallel_downloads):
"""
Download a pipeline, configs and singularity container.
Download a pipeline, nf-core/configs and pipeline singularity images.

Collects all workflow files and shared configs from nf-core/configs.
Configures the downloaded workflow to use the relative path to the configs.
Collects all files in a single archive and configures the downloaded
workflow to use relative paths to the configs and singularity images.
"""
dl = nf_core.download.DownloadWorkflow(pipeline, release, singularity, outdir, compress)
dl = nf_core.download.DownloadWorkflow(
pipeline, release, outdir, compress, force, singularity, singularity_cache, parallel_downloads
)
dl.download_workflow()


Expand Down
Loading