diff --git a/CHANGELOG.md b/CHANGELOG.md index c18349c31c..1bdd54ff58 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,10 @@ ### Tools helper code * Fixed some bugs in the command line interface for `nf-core launch` and improved formatting [[#829](https://github.com/nf-core/tools/pull/829)] +* New functionality for `nf-core download` to make it compatible with DSL2 pipelines [[#832](https://github.com/nf-core/tools/pull/832)] + * Singularity images in module files are now discovered and fetched + * Direct downloads of Singularity images in python allowed (much faster than running `singularity pull`) + * Downloads now work with `$NXF_SINGULARITY_CACHEDIR` so that pipelines sharing containers have efficient downloads ### Linting diff --git a/README.md b/README.md index d40e05c4e3..27abfff59b 100644 --- a/README.md +++ b/README.md @@ -276,9 +276,11 @@ Do you want to run this command now? [y/N]: n ## Downloading pipelines for offline use -Sometimes you may need to run an nf-core pipeline on a server or HPC system that has no internet connection. In this case you will need to fetch the pipeline files first, then manually transfer them to your system. +Sometimes you may need to run an nf-core pipeline on a server or HPC system that has no internet connection. +In this case you will need to fetch the pipeline files first, then manually transfer them to your system. -To make this process easier and ensure accurate retrieval of correctly versioned code and software containers, we have written a download helper tool. Simply specify the name of the nf-core pipeline and it will be downloaded to your current working directory. +To make this process easier and ensure accurate retrieval of correctly versioned code and software containers, we have written a download helper tool. +Simply specify the name of the nf-core pipeline and it will be downloaded to your current working directory. By default, the pipeline will download the pipeline code and the [institutional nf-core/configs](https://github.com/nf-core/configs) files. If you specify the flag `--singularity`, it will also download any singularity image files that are required. @@ -297,9 +299,9 @@ $ nf-core download methylseq -r 1.4 --singularity nf-core/tools version 1.10 INFO Saving methylseq - Pipeline release: 1.4 - Pull singularity containers: No - Output file: nf-core-methylseq-1.4.tar.gz + Pipeline release: '1.4' + Pull singularity containers: 'No' + Output file: 'nf-core-methylseq-1.4.tar.gz' INFO Downloading workflow files from GitHub INFO Downloading centralised configs from GitHub INFO Compressing download.. @@ -311,7 +313,7 @@ The tool automatically compresses all of the resulting file in to a `.tar.gz` ar You can choose other formats (`.tar.bz2`, `zip`) or to not compress (`none`) with the `-c`/`--compress` flag. The console output provides the command you need to extract the files. -Once uncompressed, you will see the following file structure for the downloaded pipeline: +Once uncompressed, you will see something like the following file structure for the downloaded pipeline: ```console $ tree -L 2 nf-core-methylseq-1.4/ @@ -326,8 +328,6 @@ nf-core-methylseq-1.4 │   ├── nextflow.config │   ├── nfcore_custom.config │   └── README.md -├── singularity-images -│   └── nf-core-methylseq-1.4.simg └── workflow ├── assets ├── bin @@ -342,25 +342,63 @@ nf-core-methylseq-1.4 ├── nextflow.config ├── nextflow_schema.json └── README.md - -10 directories, 15 files ``` -The pipeline files are automatically updated so that the local copy of institutional configs are available when running the pipeline. +The pipeline files are automatically updated (`params.custom_config_base` is set to `../configs`), so that the local copy of institutional configs are available when running the pipeline. So using `-profile ` should work if available within [nf-core/configs](https://github.com/nf-core/configs). -You can run the pipeline by simply providing the directory path for the `workflow` folder. -Note that if using Singularity, you will also need to provide the path to the Singularity image. -For example: +You can run the pipeline by simply providing the directory path for the `workflow` folder to your `nextflow run` command. -```bash -nextflow run /path/to/nf-core-methylseq-1.4/workflow/ \ - -profile singularity \ - -with-singularity /path/to/nf-core-methylseq-1.4/singularity-images/nf-core-methylseq-1.4.simg \ - # .. other normal pipeline parameters from here on.. - --input '*_R{1,2}.fastq.gz' --genome GRCh38 +By default, the download will not run if a target directory or archive already exists. Use the `--force` flag to overwrite / delete any existing download files _(not including those in the Singularity cache directory, see below)_. + +### Downloading singularity containers + +If you're using Singularity, the `nf-core download` command can also fetch the required Singularity container images for you. +To do this, specify the `--singularity` option. +Your archive / target output directory will then include three folders: `workflow`, `configs` and also `singularity-containers`. + +The downloaded workflow files are again edited to add the following line to the end of the pipeline's `nextflow.config` file: + +```nextflow +singularity.cacheDir = "${projectDir}/../singularity-images/" ``` +This tells Nextflow to use the `singularity-containers` directory relative to the workflow for the singularity image cache directory. +All images should be downloaded there, so Nextflow will use them instead of trying to pull from the internet. + +### Singularity cache directory + +We highly recommend setting the `$NXF_SINGULARITY_CACHEDIR` environment variable on your system, even if that is a different system to where you will be running Nextflow. + +If found, the tool will fetch the Singularity images to this directory first before copying to the target output archive / directory. +Any images previously fetched will be found there and copied directly - this includes images that may be shared with other pipelines or previous pipeline version downloads or download attempts. + +If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose specify `--singularity-cache`. +This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory. +The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly. + +### How the Singularity image downloads work + +The Singularity image download finds containers using two methods: + +1. It runs `nextflow config` on the downloaded workflow to look for a `process.container` statement for the whole pipeline. + This is the typical method used for DSL1 pipelines. +2. It scrapes any files it finds with a `.nf` file extension in the workflow `modules` directory for lines + that look like `container = "xxx"`. This is the typical method for DSL2 pipelines, which have one container per process. + +Some DSL2 modules have container addresses for docker (eg. `quay.io/biocontainers/fastqc:0.11.9--0`) and also URLs for direct downloads of a Singularity continaer (eg. `https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0`). +Where both are found, the download URL is preferred. + +Once a full list of containers is found, they are processed in the following order: + +1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache` specified) +2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache` is _not_ specified, they are copied to the output directory +3. If they start with `http` they are downloaded directly within Python (default 4 at a time, you can customise this with `--parallel-downloads`) +4. If they look like a Docker image name, they are fetched using a `singularity pull` command + * This requires Singularity to be installed on the system and is substantially slower + +Note that compressing many GBs of binary files can be slow, so specifying `--compress none` is recommended when downloading Singularity images. + ## Pipeline software licences Sometimes it's useful to see the software licences of the tools used in a pipeline. You can use the `licences` subcommand to fetch and print the software licence from each conda / PyPI package used in an nf-core pipeline. diff --git a/nf_core/__main__.py b/nf_core/__main__.py index c4d7d8642f..c60d0334d1 100755 --- a/nf_core/__main__.py +++ b/nf_core/__main__.py @@ -202,23 +202,34 @@ def launch(pipeline, id, revision, command_only, params_in, params_out, save_all @nf_core_cli.command(help_priority=3) @click.argument("pipeline", required=True, metavar="") @click.option("-r", "--release", type=str, help="Pipeline release") -@click.option("-s", "--singularity", is_flag=True, default=False, help="Download singularity containers") @click.option("-o", "--outdir", type=str, help="Output directory") @click.option( "-c", "--compress", type=click.Choice(["tar.gz", "tar.bz2", "zip", "none"]), default="tar.gz", - help="Compression type", + help="Archive compression type", ) -def download(pipeline, release, singularity, outdir, compress): +@click.option("-f", "--force", is_flag=True, default=False, help="Overwrite existing files") +@click.option("-s", "--singularity", is_flag=True, default=False, help="Download singularity images") +@click.option( + "-c", + "--singularity-cache", + is_flag=True, + default=False, + help="Don't copy images to the output directory, don't set 'singularity.cacheDir' in workflow", +) +@click.option("-p", "--parallel-downloads", type=int, default=4, help="Number of parallel image downloads") +def download(pipeline, release, outdir, compress, force, singularity, singularity_cache, parallel_downloads): """ - Download a pipeline, configs and singularity container. + Download a pipeline, nf-core/configs and pipeline singularity images. - Collects all workflow files and shared configs from nf-core/configs. - Configures the downloaded workflow to use the relative path to the configs. + Collects all files in a single archive and configures the downloaded + workflow to use relative paths to the configs and singularity images. """ - dl = nf_core.download.DownloadWorkflow(pipeline, release, singularity, outdir, compress) + dl = nf_core.download.DownloadWorkflow( + pipeline, release, outdir, compress, force, singularity, singularity_cache, parallel_downloads + ) dl.download_workflow() diff --git a/nf_core/download.py b/nf_core/download.py index 0586fb9cc7..3591b424ab 100644 --- a/nf_core/download.py +++ b/nf_core/download.py @@ -8,19 +8,58 @@ import logging import hashlib import os +import re import requests +import requests_cache import shutil import subprocess import sys import tarfile +import concurrent.futures +from rich.progress import BarColumn, DownloadColumn, TransferSpeedColumn, Progress from zipfile import ZipFile +import nf_core import nf_core.list import nf_core.utils log = logging.getLogger(__name__) +class DownloadProgress(Progress): + """Custom Progress bar class, allowing us to have two progress + bars with different columns / layouts. + """ + + def get_renderables(self): + for task in self.tasks: + if task.fields.get("progress_type") == "summary": + self.columns = ( + "[magenta]{task.description}", + BarColumn(bar_width=None), + "[progress.percentage]{task.percentage:>3.0f}%", + "•", + "[green]{task.completed}/{task.total} completed", + ) + if task.fields.get("progress_type") == "download": + self.columns = ( + "[blue]{task.description}", + BarColumn(bar_width=None), + "[progress.percentage]{task.percentage:>3.1f}%", + "•", + DownloadColumn(), + "•", + TransferSpeedColumn(), + ) + if task.fields.get("progress_type") == "singularity_pull": + self.columns = ( + "[magenta]{task.description}", + "[blue]{task.fields[current_log]}", + BarColumn(bar_width=None), + ) + yield self.make_tasks_table([task]) + + class DownloadWorkflow(object): """Downloads a nf-core workflow from GitHub to the local file system. @@ -33,15 +72,33 @@ class DownloadWorkflow(object): outdir (str): Path to the local download directory. Defaults to None. """ - def __init__(self, pipeline, release=None, singularity=False, outdir=None, compress_type="tar.gz"): + def __init__( + self, + pipeline, + release=None, + outdir=None, + compress_type="tar.gz", + force=False, + singularity=False, + singularity_cache_only=False, + parallel_downloads=4, + ): self.pipeline = pipeline self.release = release - self.singularity = singularity self.outdir = outdir self.output_filename = None self.compress_type = compress_type if self.compress_type == "none": self.compress_type = None + self.force = force + self.singularity = singularity + self.singularity_cache_only = singularity_cache_only + self.parallel_downloads = parallel_downloads + + # Sanity checks + if self.singularity_cache_only and not self.singularity: + log.error("Command has '--singularity-cache' set, but not '--singularity'") + sys.exit(1) self.wf_name = None self.wf_sha = None @@ -57,29 +114,38 @@ def download_workflow(self): except LookupError: sys.exit(1) - output_logmsg = "Output directory: {}".format(self.outdir) + summary_log = [ + "Pipeline release: '{}'".format(self.release), + "Pull singularity containers: '{}'".format("Yes" if self.singularity else "No"), + ] + if self.singularity and os.environ.get("NXF_SINGULARITY_CACHEDIR"): + summary_log.append("Using '$NXF_SINGULARITY_CACHEDIR': {}".format(os.environ["NXF_SINGULARITY_CACHEDIR"])) # Set an output filename now that we have the outdir if self.compress_type is not None: - self.output_filename = "{}.{}".format(self.outdir, self.compress_type) - output_logmsg = "Output file: {}".format(self.output_filename) + self.output_filename = f"{self.outdir}.{self.compress_type}" + summary_log.append(f"Output file: '{self.output_filename}'") + else: + summary_log.append(f"Output directory: '{self.outdir}'") # Check that the outdir doesn't already exist if os.path.exists(self.outdir): - log.error("Output directory '{}' already exists".format(self.outdir)) - sys.exit(1) + if not self.force: + log.error(f"Output directory '{self.outdir}' already exists (use [red]--force[/] to overwrite)") + sys.exit(1) + log.warning(f"Deleting existing output directory: '{self.outdir}'") + shutil.rmtree(self.outdir) # Check that compressed output file doesn't already exist if self.output_filename and os.path.exists(self.output_filename): - log.error("Output file '{}' already exists".format(self.output_filename)) - sys.exit(1) + if not self.force: + log.error(f"Output file '{self.output_filename}' already exists (use [red]--force[/] to overwrite)") + sys.exit(1) + log.warning(f"Deleting existing output file: '{self.output_filename}'") + os.remove(self.output_filename) - log.info( - "Saving {}".format(self.pipeline) - + "\n Pipeline release: {}".format(self.release) - + "\n Pull singularity containers: {}".format("Yes" if self.singularity else "No") - + "\n {}".format(output_logmsg) - ) + # Summary log + log.info("Saving {}\n {}".format(self.pipeline, "\n ".join(summary_log))) # Download the pipeline files log.info("Downloading workflow files from GitHub") @@ -92,25 +158,8 @@ def download_workflow(self): # Download the singularity images if self.singularity: - log.debug("Fetching container names for workflow") self.find_container_images() - if len(self.containers) == 0: - log.info("No container names found in workflow") - else: - os.mkdir(os.path.join(self.outdir, "singularity-images")) - log.info( - "Downloading {} singularity container{}".format( - len(self.containers), "s" if len(self.containers) > 1 else "" - ) - ) - for container in self.containers: - try: - # Download from Docker Hub in all cases - self.pull_singularity_image(container) - except RuntimeWarning as r: - # Raise exception if this is not possible - log.error("Not able to pull image. Service might be down or internet connection is dead.") - raise r + self.get_singularity_images() # Compress into an archive if self.compress_type is not None: @@ -238,7 +287,7 @@ def wf_use_local_configs(self): nfconfig_fn = os.path.join(self.outdir, "workflow", "nextflow.config") find_str = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" repl_str = "../configs/" - log.debug("Editing params.custom_config_base in {}".format(nfconfig_fn)) + log.debug("Editing 'params.custom_config_base' in '{}'".format(nfconfig_fn)) # Load the nextflow.config file into memory with open(nfconfig_fn, "r") as nfconfig_fh: @@ -247,12 +296,29 @@ def wf_use_local_configs(self): # Replace the target string nfconfig = nfconfig.replace(find_str, repl_str) + # Append the singularity.cacheDir to the end if we need it + if self.singularity and not self.singularity_cache_only: + nfconfig += ( + f"\n\n// Added by `nf-core download` v{nf_core.__version__} //\n" + + 'singularity.cacheDir = "${projectDir}/../singularity-images/"' + + "\n///////////////////////////////////////" + ) + # Write the file out again with open(nfconfig_fn, "w") as nfconfig_fh: nfconfig_fh.write(nfconfig) def find_container_images(self): - """ Find container image names for workflow """ + """Find container image names for workflow. + + Starts by using `nextflow config` to pull out any process.container + declarations. This works for DSL1. + + Second, we look for DSL2 containers. These can't be found with + `nextflow config` at the time of writing, so we scrape the pipeline files. + """ + + log.info("Fetching container names for workflow") # Use linting code to parse the pipeline nextflow config self.nf_config = nf_core.utils.fetch_wf_config(os.path.join(self.outdir, "workflow")) @@ -262,26 +328,310 @@ def find_container_images(self): if k.startswith("process.") and k.endswith(".container"): self.containers.append(v.strip('"').strip("'")) - def pull_singularity_image(self, container): - """Uses a local installation of singularity to pull an image from Docker Hub. + # Recursive search through any DSL2 module files for container spec lines. + for subdir, dirs, files in os.walk(os.path.join(self.outdir, "workflow", "modules")): + for file in files: + if file.endswith(".nf"): + with open(os.path.join(subdir, file), "r") as fh: + # Look for any lines with `container = "xxx"` + matches = [] + for line in fh: + match = re.match(r"\s*container\s+[\"']([^\"']+)[\"']", line) + if match: + matches.append(match.group(1)) + + # If we have matches, save the first one that starts with http + for m in matches: + if m.startswith("http"): + self.containers.append(m.strip('"').strip("'")) + break + # If we get here then we didn't call break - just save the first match + else: + if len(matches) > 0: + self.containers.append(matches[0].strip('"').strip("'")) + + # Remove duplicates and sort + self.containers = sorted(list(set(self.containers))) + + log.info("Found {} container{}".format(len(self.containers), "s" if len(self.containers) > 1 else "")) + + def get_singularity_images(self): + """Loop through container names and download Singularity images""" + + if len(self.containers) == 0: + log.info("No container names found in workflow") + else: + if not os.environ.get("NXF_SINGULARITY_CACHEDIR"): + log.info( + "[magenta]Tip: Set env var $NXF_SINGULARITY_CACHEDIR to use a central cache for container downloads" + ) + + with DownloadProgress() as progress: + task = progress.add_task("all_containers", total=len(self.containers), progress_type="summary") + + # Organise containers based on what we need to do with them + containers_exist = [] + containers_cache = [] + containers_download = [] + containers_pull = [] + for container in self.containers: + + # Fetch the output and cached filenames for this container + out_path, cache_path = self.singularity_image_filenames(container) + + # Check that the directories exist + out_path_dir = os.path.dirname(out_path) + if not os.path.isdir(out_path_dir): + log.debug(f"Output directory not found, creating: {out_path_dir}") + os.makedirs(out_path_dir) + if cache_path: + cache_path_dir = os.path.dirname(cache_path) + if not os.path.isdir(cache_path_dir): + log.debug(f"Cache directory not found, creating: {cache_path_dir}") + os.makedirs(cache_path_dir) + + # We already have the target file in place, return + if os.path.exists(out_path): + containers_exist.append(container) + continue + + # We have a copy of this in the NXF_SINGULARITY_CACHE dir + if cache_path and os.path.exists(cache_path): + containers_cache.append([container, out_path, cache_path]) + continue + + # Direct download within Python + if container.startswith("http"): + containers_download.append([container, out_path, cache_path]) + continue + + # Pull using singularity + containers_pull.append([container, out_path, cache_path]) + + # Go through each method of fetching containers in order + for container in containers_exist: + progress.update(task, description="Image file exists") + progress.update(task, advance=1) + + for container in containers_cache: + progress.update(task, description=f"Copying singularity images from cache") + self.singularity_copy_cache_image(*container) + progress.update(task, advance=1) + + with concurrent.futures.ThreadPoolExecutor(max_workers=self.parallel_downloads) as pool: + progress.update(task, description="Downloading singularity images") + + # Kick off concurrent downloads + future_downloads = [ + pool.submit(self.singularity_download_image, *container, progress) + for container in containers_download + ] + + # Make ctrl-c work with multi-threading + self.kill_with_fire = False + + try: + # Iterate over each threaded download, waiting for them to finish + for future in concurrent.futures.as_completed(future_downloads): + try: + future.result() + except Exception: + raise + else: + try: + progress.update(task, advance=1) + except Exception as e: + log.error(f"Error updating progress bar: {e}") + + except KeyboardInterrupt: + # Cancel the future threads that haven't started yet + for future in future_downloads: + future.cancel() + # Set the variable that the threaded function looks for + # Will trigger an exception from each thread + self.kill_with_fire = True + # Re-raise exception on the main thread + raise + + for container in containers_pull: + progress.update(task, description="Pulling singularity images") + try: + self.singularity_pull_image(*container, progress) + except RuntimeWarning as r: + # Raise exception if this is not possible + log.error("Not able to pull image. Service might be down or internet connection is dead.") + raise r + progress.update(task, advance=1) + + def singularity_image_filenames(self, container): + """Check Singularity cache for image, copy to destination folder if found. + + Args: + container (str): A pipeline's container name. Can be direct download URL + or a Docker Hub repository ID. + + Returns: + results (bool, str): Returns True if we have the image in the target location. + Returns a download path if not. + """ + + # Generate file paths + # Based on simpleName() function in Nextflow code: + # https://github.com/nextflow-io/nextflow/blob/671ae6d85df44f906747c16f6d73208dbc402d49/modules/nextflow/src/main/groovy/nextflow/container/SingularityCache.groovy#L69-L94 + out_name = container + # Strip URI prefix + out_name = re.sub(r"^.*:\/\/", "", out_name) + # Detect file extension + extension = ".img" + if ".sif:" in out_name: + extension = ".sif" + out_name = out_name.replace(".sif:", "-") + elif out_name.endswith(".sif"): + extension = ".sif" + out_name = out_name[:-4] + # Strip : and / characters + out_name = out_name.replace("/", "-").replace(":", "-") + # Stupid Docker Hub not allowing hyphens + out_name = out_name.replace("nfcore", "nf-core") + # Add file extension + out_name = out_name + extension + + # Full destination and cache paths + out_path = os.path.abspath(os.path.join(self.outdir, "singularity-images", out_name)) + cache_path = None + if os.environ.get("NXF_SINGULARITY_CACHEDIR"): + cache_path = os.path.join(os.environ["NXF_SINGULARITY_CACHEDIR"], out_name) + # Use only the cache - set this as the main output path + if self.singularity_cache_only: + out_path = cache_path + cache_path = None + elif self.singularity_cache_only: + raise FileNotFoundError("'--singularity-cache' specified but no '$NXF_SINGULARITY_CACHEDIR' set!") + + return (out_path, cache_path) + + def singularity_copy_cache_image(self, container, out_path, cache_path): + """Copy Singularity image from NXF_SINGULARITY_CACHEDIR to target folder.""" + # Copy to destination folder if we have a cached version + if cache_path and os.path.exists(cache_path): + log.debug("Copying {} from cache: '{}'".format(container, os.path.basename(out_path))) + shutil.copyfile(cache_path, out_path) + + def singularity_download_image(self, container, out_path, cache_path, progress): + """Download a singularity image from the web. + + Use native Python to download the file. Args: container (str): A pipeline's container name. Usually it is of similar format - to `nfcore/name:dev`. + to ``https://depot.galaxyproject.org/singularity/name:version`` + out_path (str): The final target output path + cache_path (str, None): The NXF_SINGULARITY_CACHEDIR path if set, None if not + progress (Progress): Rich progress bar instance to add tasks to. + """ + log.debug(f"Downloading Singularity image: '{container}'") + + # Set output path to save file to + output_path = cache_path or out_path + output_path_tmp = f"{output_path}.partial" + log.debug(f"Downloading to: '{output_path_tmp}'") + + # Set up progress bar + nice_name = container.split("/")[-1][:50] + task = progress.add_task(nice_name, start=False, total=False, progress_type="download") + try: + # Delete temporary file if it already exists + if os.path.exists(output_path_tmp): + os.remove(output_path_tmp) + + # Open file handle and download + with open(output_path_tmp, "wb") as fh: + # Disable caching as this breaks streamed downloads + with requests_cache.disabled(): + r = requests.get(container, allow_redirects=True, stream=True, timeout=60 * 5) + filesize = r.headers.get("Content-length") + if filesize: + progress.update(task, total=int(filesize)) + progress.start_task(task) + + # Stream download + for data in r.iter_content(chunk_size=4096): + # Check that the user didn't hit ctrl-c + if self.kill_with_fire: + raise KeyboardInterrupt + progress.update(task, advance=len(data)) + fh.write(data) + + # Rename partial filename to final filename + os.rename(output_path_tmp, output_path) + output_path_tmp = None + + # Copy cached download if we are using the cache + if cache_path: + log.debug("Copying {} from cache: '{}'".format(container, os.path.basename(out_path))) + progress.update(task, description="Copying from cache to target directory") + shutil.copyfile(cache_path, out_path) + + progress.remove_task(task) + + except: + # Kill the progress bars + for t in progress.task_ids: + progress.remove_task(t) + # Try to delete the incomplete download + log.debug(f"Deleting incompleted singularity image download:\n'{output_path_tmp}'") + if output_path_tmp and os.path.exists(output_path_tmp): + os.remove(output_path_tmp) + if output_path and os.path.exists(output_path): + os.remove(output_path) + # Re-raise the caught exception + raise + + def singularity_pull_image(self, container, out_path, cache_path, progress): + """Pull a singularity image using ``singularity pull`` + + Attempt to use a local installation of singularity to pull the image. + + Args: + container (str): A pipeline's container name. Usually it is of similar format + to ``nfcore/name:version``. Raises: Various exceptions possible from `subprocess` execution of Singularity. """ - out_name = "{}.simg".format(container.replace("nfcore", "nf-core").replace("/", "-").replace(":", "-")) - out_path = os.path.abspath(os.path.join(self.outdir, "singularity-images", out_name)) + output_path = cache_path or out_path + + # Pull using singularity address = "docker://{}".format(container.replace("docker://", "")) - singularity_command = ["singularity", "pull", "--name", out_path, address] - log.info("Building singularity image from Docker Hub: {}".format(address)) + singularity_command = ["singularity", "pull", "--name", output_path, address] + log.debug("Building singularity image: {}".format(address)) log.debug("Singularity command: {}".format(" ".join(singularity_command))) + # Progress bar to show that something is happening + task = progress.add_task(container, start=False, total=False, progress_type="singularity_pull", current_log="") + # Try to use singularity to pull image try: - subprocess.call(singularity_command) + # Run the singularity pull command + proc = subprocess.Popen( + singularity_command, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT, + universal_newlines=True, + bufsize=1, + ) + for line in proc.stdout: + log.debug(line.strip()) + progress.update(task, current_log=line.strip()) + + # Copy cached download if we are using the cache + if cache_path: + log.debug("Copying {} from cache: '{}'".format(container, os.path.basename(out_path))) + progress.update(task, current_log="Copying from cache to target directory") + shutil.copyfile(cache_path, out_path) + + progress.remove_task(task) + except OSError as e: if e.errno == errno.ENOENT: # Singularity is not installed diff --git a/nf_core/utils.py b/nf_core/utils.py index 5c9a753db1..e287919807 100644 --- a/nf_core/utils.py +++ b/nf_core/utils.py @@ -243,9 +243,6 @@ def setup_requests_cachedir(): Caching directory will be set up in the user's home directory under a .nfcore_cache subdir. """ - # Only import it if we need it - import requests_cache - pyversion = ".".join(str(v) for v in sys.version_info[0:3]) cachedir = os.path.join(os.getenv("HOME"), os.path.join(".nfcore", "cache_" + pyversion)) if not os.path.exists(cachedir): diff --git a/tests/test_download.py b/tests/test_download.py index cdf707ad93..eb14b3cf77 100644 --- a/tests/test_download.py +++ b/tests/test_download.py @@ -172,15 +172,16 @@ def test_mismatching_md5sums(self): os.remove(tmpfile) # - # Tests for 'pull_singularity_image' + # Tests for 'singularity_pull_image' # # If Singularity is not installed, will log an error and exit # If Singularity is installed, should raise an OSError due to non-existant image @pytest.mark.xfail(raises=OSError) - def test_pull_singularity_image(self): + @mock.patch("rich.progress.Progress.add_task") + def test_singularity_pull_image(self, mock_rich_progress): tmp_dir = tempfile.mkdtemp() download_obj = DownloadWorkflow(pipeline="dummy", outdir=tmp_dir) - download_obj.pull_singularity_image("a-container") + download_obj.singularity_pull_image("a-container", tmp_dir, None, mock_rich_progress) # Clean up shutil.rmtree(tmp_dir) @@ -188,7 +189,7 @@ def test_pull_singularity_image(self): # # Tests for the main entry method 'download_workflow' # - @mock.patch("nf_core.download.DownloadWorkflow.pull_singularity_image") + @mock.patch("nf_core.download.DownloadWorkflow.singularity_pull_image") def test_download_workflow_with_success(self, mock_download_image): tmp_dir = tempfile.mkdtemp()