diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index fc73294b5a..a90c06f9a9 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -142,4 +142,3 @@ To get started: Devcontainer specs: - [DevContainer config](.devcontainer/devcontainer.json) -- [Dockerfile](.devcontainer/Dockerfile) diff --git a/.github/workflows/pytest-frozen-ubuntu-20.04.yml b/.github/workflows/pytest-frozen-ubuntu-20.04.yml index b015376633..5faf8ce605 100644 --- a/.github/workflows/pytest-frozen-ubuntu-20.04.yml +++ b/.github/workflows/pytest-frozen-ubuntu-20.04.yml @@ -15,7 +15,7 @@ concurrency: cancel-in-progress: true jobs: - pytest: + pytest-frozen: runs-on: ubuntu-20.04 steps: - uses: actions/checkout@v3 diff --git a/CHANGELOG.md b/CHANGELOG.md index b79fd410dc..89c293bc0a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,15 @@ - Move registry definitions out of profile scope ([#2286])(https://github.com/nf-core/tools/pull/2286) - Remove `aws_tower` profile ([#2287])(https://github.com/nf-core/tools/pull/2287) - Fixed the Slack report to include the pipeline name ([#2291](https://github.com/nf-core/tools/pull/2291)) +- Fix link in the MultiQC report to point to exact version of output docs ([#2298](https://github.com/nf-core/tools/pull/2298)) +- Remove shcema validation from `lib` folder and use Nextflow nf-validation plugin instead ([#1771](https://github.com/nf-core/tools/pull/1771/)) +- Generate input channel from input file using Nextflow nf-validation plugin ([#1771](https://github.com/nf-core/tools/pull/1771/)) + + +### Download + +- Introduce a `--tower` flag for `nf-core download` to obtain pipelines in an offline format suited for [seqeralabs® Nextflow Tower](https://cloud.tower.nf/) ([#2247](https://github.com/nf-core/tools/pull/2247)). +- Refactored the CLI for `--singularity-cache` in `nf-core download` from a flag to an argument. The prior options were renamed to `amend` (container images are only saved in the `$NXF_SINGULARITY_CACHEDIR`) and `copy` (a copy of the image is saved with the download). `remote` was newly introduced and allows to provide a table of contents of a remote cache via an additional argument `--singularity-cache-index` ([#2247](https://github.com/nf-core/tools/pull/2247)). ### Linting diff --git a/README.md b/README.md index 0de42e86e8..dacb50ebc4 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ A python package with helper tools for the nf-core community. - [`nf-core` tools update](#update-tools) - [`nf-core list` - List available pipelines](#listing-pipelines) - [`nf-core launch` - Run a pipeline with interactive parameter prompts](#launch-a-pipeline) -- [`nf-core download` - Download pipeline for offline use](#downloading-pipelines-for-offline-use) +- [`nf-core download` - Download a pipeline for offline use](#downloading-pipelines-for-offline-use) - [`nf-core licences` - List software licences in a pipeline](#pipeline-software-licences) - [`nf-core create` - Create a new pipeline with the nf-core template](#creating-a-new-pipeline) - [`nf-core lint` - Check pipeline code against nf-core guidelines](#linting-a-workflow) @@ -348,13 +348,13 @@ nextflow run /path/to/download/nf-core-rnaseq-dev/workflow/ --input mydata.csv - ### Downloaded nf-core configs The pipeline files are automatically updated (`params.custom_config_base` is set to `../configs`), so that the local copy of institutional configs are available when running the pipeline. -So using `-profile ` should work if available within [nf-core/configs](https://github.com/nf-core/configs). +So using `-profile ` should work if available within [nf-core/configs](https://github.com/nf-core/configs). This option is not available when downloading a pipeline for use with [Nextflow Tower](#adapting-downloads-to-nextflow-tower) because the application manages all configurations separately. ### Downloading singularity containers If you're using Singularity, the `nf-core download` command can also fetch the required Singularity container images for you. To do this, select `singularity` in the prompt or specify `--container singularity` in the command. -Your archive / target output directory will then include three folders: `workflow`, `configs` and also `singularity-containers`. +Your archive / target output directory will then also include a separate folder `singularity-containers`. The downloaded workflow files are again edited to add the following line to the end of the pipeline's `nextflow.config` file: @@ -372,10 +372,9 @@ We highly recommend setting the `$NXF_SINGULARITY_CACHEDIR` environment variable If found, the tool will fetch the Singularity images to this directory first before copying to the target output archive / directory. Any images previously fetched will be found there and copied directly - this includes images that may be shared with other pipelines or previous pipeline version downloads or download attempts. -If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose to _only_ use the cache via a prompt or cli options `--singularity-cache-only` / `--singularity-cache-copy`. +If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose to _only_ use the cache via a prompt or cli options `--singularity-cache amend`. This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory. The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly. -This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory. -The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly. +If you are downloading a workflow for a different system, you can provide information about its image cache to `nf-core download`. To avoid unnecessary container image downloads, choose `--singularity-cache remote` and provide a list of already available images as plain text file to `--singularity-cache-index my_list_of_remotely_available_images.txt`. To generate this list on the remote system, run `find $NXF_SINGULARITY_CACHEDIR -name "*.img" > my_list_of_remotely_available_images.txt`. The tool will then only download and copy images into your output directory, which are missing on the remote system. #### How the Singularity image downloads work @@ -391,16 +390,22 @@ Where both are found, the download URL is preferred. Once a full list of containers is found, they are processed in the following order: -1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache-only` specified) -2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache-only` is _not_ specified, they are copied to the output directory +1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache amend` specified) +2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache copy` is specified, they are copied to the output directory 3. If they start with `http` they are downloaded directly within Python (default 4 at a time, you can customise this with `--parallel-downloads`) 4. If they look like a Docker image name, they are fetched using a `singularity pull` command - - This requires Singularity to be installed on the system and is substantially slower + - This requires Singularity/Apptainer to be installed on the system and is substantially slower -Note that compressing many GBs of binary files can be slow, so specifying `--compress none` is recommended when downloading Singularity images. +Note that compressing many GBs of binary files can be slow, so specifying `--compress none` is recommended when downloading Singularity images that are copied to the output directory. If the download speeds are much slower than your internet connection is capable of, you can set `--parallel-downloads` to a large number to download loads of images at once. +### Adapting downloads to Nextflow Tower + +[seqeralabs® Nextflow Tower](https://cloud.tower.nf/) provides a graphical user interface to oversee pipeline runs, gather statistics and configure compute resources. While pipelines added to _Tower_ are preferably hosted at a Git service, providing them as disconnected, self-reliant repositories is also possible for premises with restricted network access. Choosing the `--tower` flag will download the pipeline in an appropriate form. + +Subsequently, the `*.git` folder can be moved to it's final destination and linked with a pipeline in _Tower_ using the `file:/` prefix. + ## Pipeline software licences Sometimes it's useful to see the software licences of the tools used in a pipeline. diff --git a/nf_core/__main__.py b/nf_core/__main__.py index 735eb99e04..6d6ded471a 100644 --- a/nf_core/__main__.py +++ b/nf_core/__main__.py @@ -209,21 +209,46 @@ def launch(pipeline, id, revision, command_only, params_in, params_out, save_all # nf-core download @nf_core_cli.command() @click.argument("pipeline", required=False, metavar="") -@click.option("-r", "--revision", type=str, help="Pipeline release") +@click.option( + "-r", + "--revision", + multiple=True, + help="Pipeline release to download. Multiple invocations are possible, e.g. `-r 1.1 -r 1.2`", +) @click.option("-o", "--outdir", type=str, help="Output directory") @click.option( "-x", "--compress", type=click.Choice(["tar.gz", "tar.bz2", "zip", "none"]), help="Archive compression type" ) @click.option("-f", "--force", is_flag=True, default=False, help="Overwrite existing files") +@click.option("-t", "--tower", is_flag=True, default=False, help="Download for seqeralabs® Nextflow Tower") @click.option( "-c", "--container", type=click.Choice(["none", "singularity"]), help="Download software container images" ) @click.option( - "--singularity-cache-only/--singularity-cache-copy", - help="Don't / do copy images to the output directory and set 'singularity.cacheDir' in workflow", + "-s", + "--singularity-cache", + type=click.Choice(["amend", "copy", "remote"]), + help="Utilize the 'singularity.cacheDir' in the download process, if applicable.", +) +@click.option( + "-i", + "--singularity-cache-index", + type=str, + help="List of images already available in a remote 'singularity.cacheDir', imposes --singularity-cache=remote", ) @click.option("-p", "--parallel-downloads", type=int, default=4, help="Number of parallel image downloads") -def download(pipeline, revision, outdir, compress, force, container, singularity_cache_only, parallel_downloads): +def download( + pipeline, + revision, + outdir, + compress, + force, + tower, + container, + singularity_cache, + singularity_cache_index, + parallel_downloads, +): """ Download a pipeline, nf-core/configs and pipeline singularity images. @@ -233,7 +258,16 @@ def download(pipeline, revision, outdir, compress, force, container, singularity from nf_core.download import DownloadWorkflow dl = DownloadWorkflow( - pipeline, revision, outdir, compress, force, container, singularity_cache_only, parallel_downloads + pipeline, + revision, + outdir, + compress, + force, + tower, + container, + singularity_cache, + singularity_cache_index, + parallel_downloads, ) dl.download_workflow() diff --git a/nf_core/bump_version.py b/nf_core/bump_version.py index 129016fa38..b462ee1377 100644 --- a/nf_core/bump_version.py +++ b/nf_core/bump_version.py @@ -3,8 +3,8 @@ """ import logging -import os import re +from pathlib import Path import rich.console @@ -44,6 +44,17 @@ def bump_pipeline_version(pipeline_obj, new_version): ) ], ) + # multiqc_config.yaml + update_file_version( + Path("assets", "multiqc_config.yml"), + pipeline_obj, + [ + ( + rf"{re.escape(current_version)}", + f"{new_version}", + ) + ], + ) def bump_nextflow_version(pipeline_obj, new_version): @@ -77,7 +88,7 @@ def bump_nextflow_version(pipeline_obj, new_version): # .github/workflows/ci.yml - Nextflow version matrix update_file_version( - os.path.join(".github", "workflows", "ci.yml"), + Path(".github", "workflows", "ci.yml"), pipeline_obj, [ ( diff --git a/nf_core/download.py b/nf_core/download.py index cd36c65c4a..70f61f35a4 100644 --- a/nf_core/download.py +++ b/nf_core/download.py @@ -12,17 +12,22 @@ import sys import tarfile import textwrap +from datetime import datetime from zipfile import ZipFile +import git import questionary import requests import requests_cache import rich import rich.progress +from git.exc import GitCommandError, InvalidGitRepositoryError import nf_core import nf_core.list import nf_core.utils +from nf_core.synced_repo import RemoteProgressbar, SyncedRepo +from nf_core.utils import NFCORE_CACHE_DIR, NFCORE_DIR log = logging.getLogger(__name__) stderr = rich.console.Console( @@ -71,8 +76,9 @@ class DownloadWorkflow: Args: pipeline (str): A nf-core pipeline name. - revision (str): The workflow revision to download, like `1.0`. Defaults to None. - singularity (bool): Flag, if the Singularity container should be downloaded as well. Defaults to False. + revision (List[str]): The workflow revision to download, like `1.0`. Defaults to None. + container (bool): Flag, if the Singularity container should be downloaded as well. Defaults to False. + tower (bool): Flag, to customize the download for Nextflow Tower (convert to git bare repo). Defaults to False. outdir (str): Path to the local download directory. Defaults to None. """ @@ -83,26 +89,39 @@ def __init__( outdir=None, compress_type=None, force=False, + tower=False, container=None, - singularity_cache_only=False, + singularity_cache=None, + singularity_cache_index=None, parallel_downloads=4, ): self.pipeline = pipeline - self.revision = revision + if isinstance(revision, str): + self.revision = [revision] + elif isinstance(revision, tuple): + self.revision = [*revision] + else: + self.revision = [] self.outdir = outdir self.output_filename = None self.compress_type = compress_type self.force = force - self.container = container - self.singularity_cache_only = singularity_cache_only + self.tower = tower + self.include_configs = None + # force download of containers if a cache index is given or download is meant to be used for Tower. + self.container = "singularity" if singularity_cache_index or bool(tower) else container + # if a singularity_cache_index is given, use the file and overrule choice. + self.singularity_cache = "remote" if singularity_cache_index else singularity_cache + self.singularity_cache_index = singularity_cache_index self.parallel_downloads = parallel_downloads self.wf_revisions = {} self.wf_branches = {} - self.wf_sha = None - self.wf_download_url = None + self.wf_sha = {} + self.wf_download_url = {} self.nf_config = {} self.containers = [] + self.containers_remote = [] # stores the remote images provided in the file. # Fetch remote workflows self.wfs = nf_core.list.Workflows() @@ -119,25 +138,51 @@ def download_workflow(self): ) self.prompt_revision() self.get_revision_hash() - self.prompt_container_download() - self.prompt_use_singularity_cachedir() - self.prompt_singularity_cachedir_only() - self.prompt_compression_type() + # Inclusion of configs is unnecessary for Tower. + if not self.tower and self.include_configs is None: + self.prompt_config_inclusion() + # If a remote cache is specified, it is safe to assume images should be downloaded. + if not self.singularity_cache == "remote": + self.prompt_container_download() + else: + self.container = "singularity" + self.prompt_singularity_cachedir_creation() + self.prompt_singularity_cachedir_utilization() + self.prompt_singularity_cachedir_remote() + # Nothing meaningful to compress here. + if not self.tower: + self.prompt_compression_type() except AssertionError as e: log.critical(e) sys.exit(1) - summary_log = [f"Pipeline revision: '{self.revision}'", f"Pull containers: '{self.container}'"] + summary_log = [ + f"Pipeline revision: '{', '.join(self.revision) if len(self.revision) < 5 else self.revision[0]+',...,['+str(len(self.revision)-2)+' more revisions],...,'+self.revision[-1]}'", + f"Pull containers: '{self.container}'", + ] if self.container == "singularity" and os.environ.get("NXF_SINGULARITY_CACHEDIR") is not None: - summary_log.append(f"Using [blue]$NXF_SINGULARITY_CACHEDIR[/]': {os.environ['NXF_SINGULARITY_CACHEDIR']}") + summary_log.append(f"Using [blue]$NXF_SINGULARITY_CACHEDIR[/]': {os.environ['NXF_SINGULARITY_CACHEDIR']}'") + if self.containers_remote: + summary_log.append( + f"Successfully read {len(self.containers_remote)} containers from the remote '$NXF_SINGULARITY_CACHEDIR' contents." + ) # Set an output filename now that we have the outdir - if self.compress_type is not None: + if self.tower: + self.output_filename = f"{self.outdir}.git" + summary_log.append(f"Output file: '{self.output_filename}'") + elif self.compress_type is not None: self.output_filename = f"{self.outdir}.{self.compress_type}" summary_log.append(f"Output file: '{self.output_filename}'") else: summary_log.append(f"Output directory: '{self.outdir}'") + if not self.tower: + # Only show entry, if option was prompted. + summary_log.append(f"Include default institutional configuration: '{self.include_configs}'") + else: + summary_log.append(f"Enabled for seqeralabs® Nextflow Tower: '{self.tower}'") + # Check that the outdir doesn't already exist if os.path.exists(self.outdir): if not self.force: @@ -157,34 +202,86 @@ def download_workflow(self): # Summary log log.info("Saving '{}'\n {}".format(self.pipeline, "\n ".join(summary_log))) - # Download the pipeline files + # Perform the actual download + if self.tower: + self.download_workflow_tower() + else: + self.download_workflow_static() + + def download_workflow_static(self): + """Downloads a nf-core workflow from GitHub to the local file system in a self-contained manner.""" + + # Download the centralised configs first + if self.include_configs: + log.info("Downloading centralised configs from GitHub") + self.download_configs() + + # Download the pipeline files for each selected revision log.info("Downloading workflow files from GitHub") - self.download_wf_files() - # Download the centralised configs - log.info("Downloading centralised configs from GitHub") - self.download_configs() - try: - self.wf_use_local_configs() - except FileNotFoundError as e: - log.error("Error editing pipeline config file to use local configs!") - log.critical(e) - sys.exit(1) + for item in zip(self.revision, self.wf_sha.values(), self.wf_download_url.values()): + revision_dirname = self.download_wf_files(revision=item[0], wf_sha=item[1], download_url=item[2]) - # Download the singularity images - if self.container == "singularity": - self.find_container_images() - try: - self.get_singularity_images() - except OSError as e: - log.critical(f"[red]{e}[/]") - sys.exit(1) + if self.include_configs: + try: + self.wf_use_local_configs(revision_dirname) + except FileNotFoundError as e: + log.error("Error editing pipeline config file to use local configs!") + log.critical(e) + sys.exit(1) + + # Collect all required singularity images + if self.container == "singularity": + self.find_container_images(os.path.join(self.outdir, revision_dirname)) + + try: + self.get_singularity_images(current_revision=item[0]) + except OSError as e: + log.critical(f"[red]{e}[/]") + sys.exit(1) # Compress into an archive if self.compress_type is not None: - log.info("Compressing download..") + log.info("Compressing output into archive") self.compress_download() + def download_workflow_tower(self, location=None): + """Create a bare-cloned git repository of the workflow, so it can be launched with `tw launch` as file:/ pipeline""" + + log.info("Collecting workflow from GitHub") + + self.workflow_repo = WorkflowRepo( + remote_url=f"https://github.com/{self.pipeline}.git", + revision=self.revision if self.revision else None, + commit=self.wf_sha.values() if bool(self.wf_sha) else None, + location=location if location else None, # manual location is required for the tests to work + in_cache=False, + ) + + # Remove tags for those revisions that had not been selected + self.workflow_repo.tidy_tags_and_branches() + + # create a bare clone of the modified repository needed for Tower + self.workflow_repo.bare_clone(os.path.join(self.outdir, self.output_filename)) + + # extract the required containers + if self.container == "singularity": + for revision, commit in self.wf_sha.items(): + # Checkout the repo in the current revision + self.workflow_repo.checkout(commit) + # Collect all required singularity images + self.find_container_images(self.workflow_repo.access()) + + try: + self.get_singularity_images(current_revision=revision) + except OSError as e: + log.critical(f"[red]{e}[/]") + sys.exit(1) + + # Justify why compression is skipped for Tower downloads (Prompt is not shown, but CLI argument could have been set) + if self.compress_type is not None: + log.info("Compression choice is ignored for Tower downloads since nothing can be reasonably compressed.") + def prompt_pipeline_name(self): """Prompt for the pipeline name if not set with a flag""" @@ -193,46 +290,94 @@ def prompt_pipeline_name(self): self.pipeline = nf_core.utils.prompt_remote_pipeline_name(self.wfs) def prompt_revision(self): - """Prompt for pipeline revision / branch""" - # Prompt user for revision tag if '--revision' was not set - if self.revision is None: - self.revision = nf_core.utils.prompt_pipeline_release_branch(self.wf_revisions, self.wf_branches) + """ + Prompt for pipeline revision / branch + Prompt user for revision tag if '--revision' was not set + If --tower is specified, allow to select multiple revisions + Also the static download allows for multiple revisions, but + we do not prompt this option interactively. + """ + if not bool(self.revision): + (choice, tag_set) = nf_core.utils.prompt_pipeline_release_branch( + self.wf_revisions, self.wf_branches, multiple=self.tower + ) + """ + The checkbox() prompt unfortunately does not support passing a Validator, + so a user who keeps pressing Enter will flounder past the selection without choice. + + bool(choice), bool(tag_set): + ############################# + True, True: A choice was made and revisions were available. + False, True: No selection was made, but revisions were available -> defaults to all available. + False, False: No selection was made because no revisions were available -> raise AssertionError. + True, False: Congratulations, you found a bug! That combo shouldn't happen. + """ + + if bool(choice): + # have to make sure that self.revision is a list of strings, regardless if choice is str or list of strings. + self.revision.append(choice) if isinstance(choice, str) else self.revision.extend(choice) + else: + if bool(tag_set): + self.revision = tag_set + log.info("No particular revision was selected, all available will be downloaded.") + else: + raise AssertionError(f"No revisions of {self.pipeline} available for download.") def get_revision_hash(self): """Find specified revision / branch hash""" - # Branch - if self.revision in self.wf_branches.keys(): - self.wf_sha = self.wf_branches[self.revision] - - # Revision - else: - for r in self.wf_revisions: - if r["tag_name"] == self.revision: - self.wf_sha = r["tag_sha"] - break + for revision in self.revision: # revision is a list of strings, but may be of length 1 + # Branch + if revision in self.wf_branches.keys(): + self.wf_sha = {**self.wf_sha, revision: self.wf_branches[revision]} - # Can't find the revisions or branch - throw an error + # Revision else: - log.info( - "Available {} revisions: '{}'".format( - self.pipeline, "', '".join([r["tag_name"] for r in self.wf_revisions]) + for r in self.wf_revisions: + if r["tag_name"] == revision: + self.wf_sha = {**self.wf_sha, revision: r["tag_sha"]} + break + + # Can't find the revisions or branch - throw an error + else: + log.info( + "Available {} revisions: '{}'".format( + self.pipeline, "', '".join([r["tag_name"] for r in self.wf_revisions]) + ) ) - ) - log.info("Available {} branches: '{}'".format(self.pipeline, "', '".join(self.wf_branches.keys()))) - raise AssertionError(f"Not able to find revision / branch '{self.revision}' for {self.pipeline}") + log.info("Available {} branches: '{}'".format(self.pipeline, "', '".join(self.wf_branches.keys()))) + raise AssertionError(f"Not able to find revision / branch '{revision}' for {self.pipeline}") # Set the outdir if not self.outdir: - self.outdir = f"{self.pipeline.replace('/', '-').lower()}-{self.revision}" - - # Set the download URL and return - self.wf_download_url = f"https://github.com/{self.pipeline}/archive/{self.wf_sha}.zip" + if len(self.wf_sha) > 1: + self.outdir = f"{self.pipeline.replace('/', '-').lower()}_{datetime.now().strftime('%Y-%m-%d_%H-%M')}" + else: + self.outdir = f"{self.pipeline.replace('/', '-').lower()}_{self.revision[0]}" + + if not self.tower: + for revision, wf_sha in self.wf_sha.items(): + # Set the download URL and return - only applicable for classic downloads + self.wf_download_url = { + **self.wf_download_url, + revision: f"https://github.com/{self.pipeline}/archive/{wf_sha}.zip", + } + + def prompt_config_inclusion(self): + """Prompt for inclusion of institutional configurations""" + if stderr.is_interactive: # Use rich auto-detection of interactive shells + self.include_configs = questionary.confirm( + "Include the nf-core's default institutional configuration files into the download?", + style=nf_core.utils.nfcore_question_style, + ).ask() + else: + self.include_configs = False + # do not include by default. def prompt_container_download(self): """Prompt whether to download container images or not""" - if self.container is None: + if self.container is None and stderr.is_interactive and not self.tower: stderr.print("\nIn addition to the pipeline code, this tool can download software containers.") self.container = questionary.select( "Download software container images:", @@ -240,7 +385,7 @@ def prompt_container_download(self): style=nf_core.utils.nfcore_question_style, ).unsafe_ask() - def prompt_use_singularity_cachedir(self): + def prompt_singularity_cachedir_creation(self): """Prompt about using $NXF_SINGULARITY_CACHEDIR if not already set""" if ( self.container == "singularity" @@ -254,6 +399,8 @@ def prompt_use_singularity_cachedir(self): if rich.prompt.Confirm.ask( "[blue bold]?[/] [bold]Define [blue not bold]$NXF_SINGULARITY_CACHEDIR[/] for a shared Singularity image download folder?[/]" ): + if not self.singularity_cache_index: + self.singularity_cache == "amend" # retain "remote" choice. # Prompt user for a cache directory path cachedir_path = None while cachedir_path is None: @@ -270,53 +417,128 @@ def prompt_use_singularity_cachedir(self): if cachedir_path: os.environ["NXF_SINGULARITY_CACHEDIR"] = cachedir_path - # Ask if user wants this set in their .bashrc - bashrc_path = os.path.expanduser("~/.bashrc") - if not os.path.isfile(bashrc_path): - bashrc_path = os.path.expanduser("~/.bash_profile") - if not os.path.isfile(bashrc_path): - bashrc_path = False - if bashrc_path: + """ + Optionally, create a permanent entry for the NXF_SINGULARITY_CACHEDIR in the terminal profile. + Currently support for bash and zsh. + ToDo: "sh", "dash", "ash","csh", "tcsh", "ksh", "fish", "cmd", "powershell", "pwsh"? + """ + + if os.getenv("SHELL", "") == "/bin/bash": + shellprofile_path = os.path.expanduser("~/~/.bash_profile") + if not os.path.isfile(shellprofile_path): + shellprofile_path = os.path.expanduser("~/.bashrc") + if not os.path.isfile(shellprofile_path): + shellprofile_path = False + elif os.getenv("SHELL", "") == "/bin/zsh": + shellprofile_path = os.path.expanduser("~/.zprofile") + if not os.path.isfile(shellprofile_path): + shellprofile_path = os.path.expanduser("~/.zshenv") + if not os.path.isfile(shellprofile_path): + shellprofile_path = False + else: + shellprofile_path = os.path.expanduser("~/.profile") + if not os.path.isfile(shellprofile_path): + shellprofile_path = False + + if shellprofile_path: stderr.print( - f"\nSo that [blue]$NXF_SINGULARITY_CACHEDIR[/] is always defined, you can add it to your [blue not bold]~/{os.path.basename(bashrc_path)}[/] file ." - "This will then be autmoatically set every time you open a new terminal. We can add the following line to this file for you: \n" + f"\nSo that [blue]$NXF_SINGULARITY_CACHEDIR[/] is always defined, you can add it to your [blue not bold]~/{os.path.basename(shellprofile_path)}[/] file ." + "This will then be automatically set every time you open a new terminal. We can add the following line to this file for you: \n" f'[blue]export NXF_SINGULARITY_CACHEDIR="{cachedir_path}"[/]' ) append_to_file = rich.prompt.Confirm.ask( - f"[blue bold]?[/] [bold]Add to [blue not bold]~/{os.path.basename(bashrc_path)}[/] ?[/]" + f"[blue bold]?[/] [bold]Add to [blue not bold]~/{os.path.basename(shellprofile_path)}[/] ?[/]" ) if append_to_file: - with open(os.path.expanduser(bashrc_path), "a") as f: + with open(os.path.expanduser(shellprofile_path), "a") as f: f.write( "\n\n#######################################\n" f"## Added by `nf-core download` v{nf_core.__version__} ##\n" + f'export NXF_SINGULARITY_CACHEDIR="{cachedir_path}"' + "\n#######################################\n" ) - log.info(f"Successfully wrote to [blue]{bashrc_path}[/]") + log.info(f"Successfully wrote to [blue]{shellprofile_path}[/]") log.warning( "You will need reload your terminal after the download completes for this to take effect." ) - def prompt_singularity_cachedir_only(self): + def prompt_singularity_cachedir_utilization(self): """Ask if we should *only* use $NXF_SINGULARITY_CACHEDIR without copying into target""" if ( - self.singularity_cache_only is None + self.singularity_cache is None # no choice regarding singularity cache has been made. and self.container == "singularity" and os.environ.get("NXF_SINGULARITY_CACHEDIR") is not None + and stderr.is_interactive ): stderr.print( - "\nIf you are working on the same system where you will run Nextflow, you can leave the downloaded images in the " - "[blue not bold]$NXF_SINGULARITY_CACHEDIR[/] folder, Nextflow will automatically find them. " + "\nIf you are working on the same system where you will run Nextflow, you can amend the downloaded images to the ones in the" + "[blue not bold]$NXF_SINGULARITY_CACHEDIR[/] folder, Nextflow will automatically find them." "However if you will transfer the downloaded files to a different system then they should be copied to the target folder." ) - self.singularity_cache_only = rich.prompt.Confirm.ask( - "[blue bold]?[/] [bold]Copy singularity images from [blue not bold]$NXF_SINGULARITY_CACHEDIR[/] to the target folder?[/]" - ) + self.singularity_cache = questionary.select( + "Copy singularity images from $NXF_SINGULARITY_CACHEDIR to the target folder or amend new images to the cache?", + choices=["amend", "copy"], + style=nf_core.utils.nfcore_question_style, + ).unsafe_ask() - # Sanity check, for when passed as a cli flag - if self.singularity_cache_only and self.container != "singularity": - raise AssertionError("Command has '--singularity-cache-only' set, but '--container' is not 'singularity'") + def prompt_singularity_cachedir_remote(self): + """Prompt about the index of a remote $NXF_SINGULARITY_CACHEDIR""" + if ( + self.container == "singularity" + and self.singularity_cache == "remote" + and self.singularity_cache_index is None + and stderr.is_interactive # Use rich auto-detection of interactive shells + ): + # Prompt user for a file listing the contents of the remote cache directory + cachedir_index = None + while cachedir_index is None: + prompt_cachedir_index = questionary.path( + "Specify a list of the remote images already present in the remote system :", + file_filter="*.txt", + style=nf_core.utils.nfcore_question_style, + ).unsafe_ask() + cachedir_index = os.path.abspath(os.path.expanduser(prompt_cachedir_index)) + if prompt_cachedir_index == "": + log.error("Will disregard contents of a remote [blue]$NXF_SINGULARITY_CACHEDIR[/]") + self.singularity_cache_index = None + self.singularity_cache = "copy" + elif not os.access(cachedir_index, os.R_OK): + log.error(f"'{cachedir_index}' is not a readable file.") + cachedir_index = None + if cachedir_index: + self.singularity_cache_index = cachedir_index + # in any case read the remote containers, even if no prompt was shown. + self.read_remote_containers() + + def read_remote_containers(self): + """Reads the file specified as index for the remote Singularity cache dir""" + if ( + self.container == "singularity" + and self.singularity_cache == "remote" + and self.singularity_cache_index is not None + ): + n_total_images = 0 + try: + with open(self.singularity_cache_index) as indexfile: + for line in indexfile.readlines(): + match = re.search(r"([^\/\\]+\.img)", line, re.S) + if match: + n_total_images += 1 + self.containers_remote.append(match.group(0)) + if n_total_images == 0: + raise LookupError("Could not find valid container names in the index file.") + self.containers_remote = sorted(list(set(self.containers_remote))) + except (FileNotFoundError, LookupError) as e: + log.error(f"[red]Issue with reading the specified remote $NXF_SINGULARITY_CACHE index:[/]\n{e}\n") + if stderr.is_interactive and rich.prompt.Confirm.ask(f"[blue]Specify a new index file and try again?"): + self.prompt_singularity_cachedir_remote(retry=True) + else: + log.info("Proceeding without consideration of the remote $NXF_SINGULARITY_CACHE index.") + self.singularity_cache_index = None + if os.environ.get("NXF_SINGULARITY_CACHEDIR"): + self.singularity_cache = "copy" # default to copy if possible, otherwise skip. + else: + self.singularity_cache = None def prompt_compression_type(self): """Ask user if we should compress the downloaded files""" @@ -343,24 +565,32 @@ def prompt_compression_type(self): if self.compress_type == "none": self.compress_type = None - def download_wf_files(self): + def download_wf_files(self, revision, wf_sha, download_url): """Downloads workflow files from GitHub to the :attr:`self.outdir`.""" - log.debug(f"Downloading {self.wf_download_url}") + log.debug(f"Downloading {download_url}") # Download GitHub zip file into memory and extract - url = requests.get(self.wf_download_url) + url = requests.get(download_url) with ZipFile(io.BytesIO(url.content)) as zipfile: zipfile.extractall(self.outdir) + # create a filesystem-safe version of the revision name for the directory + revision_dirname = re.sub("[^0-9a-zA-Z]+", "_", revision) + # account for name collisions, if there is a branch / release named "configs" or "singularity-images" + if revision_dirname in ["configs", "singularity-images"]: + revision_dirname = re.sub("[^0-9a-zA-Z]+", "_", self.pipeline + revision_dirname) + # Rename the internal directory name to be more friendly - gh_name = f"{self.pipeline}-{self.wf_sha}".split("/")[-1] - os.rename(os.path.join(self.outdir, gh_name), os.path.join(self.outdir, "workflow")) + gh_name = f"{self.pipeline}-{wf_sha if bool(wf_sha) else ''}".split("/")[-1] + os.rename(os.path.join(self.outdir, gh_name), os.path.join(self.outdir, revision_dirname)) # Make downloaded files executable - for dirpath, _, filelist in os.walk(os.path.join(self.outdir, "workflow")): + for dirpath, _, filelist in os.walk(os.path.join(self.outdir, revision_dirname)): for fname in filelist: os.chmod(os.path.join(dirpath, fname), 0o775) + return revision_dirname + def download_configs(self): """Downloads the centralised config profiles from nf-core/configs to :attr:`self.outdir`.""" configs_zip_url = "https://github.com/nf-core/configs/archive/master.zip" @@ -380,9 +610,9 @@ def download_configs(self): for fname in filelist: os.chmod(os.path.join(dirpath, fname), 0o775) - def wf_use_local_configs(self): + def wf_use_local_configs(self, revision_dirname): """Edit the downloaded nextflow.config file to use the local config files""" - nfconfig_fn = os.path.join(self.outdir, "workflow", "nextflow.config") + nfconfig_fn = os.path.join(self.outdir, revision_dirname, "nextflow.config") find_str = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" repl_str = "${projectDir}/../configs/" log.debug(f"Editing 'params.custom_config_base' in '{nfconfig_fn}'") @@ -396,7 +626,7 @@ def wf_use_local_configs(self): nfconfig = nfconfig.replace(find_str, repl_str) # Append the singularity.cacheDir to the end if we need it - if self.container == "singularity" and not self.singularity_cache_only: + if self.container == "singularity" and self.singularity_cache == "copy": nfconfig += ( f"\n\n// Added by `nf-core download` v{nf_core.__version__} //\n" + 'singularity.cacheDir = "${projectDir}/../singularity-images/"' @@ -408,7 +638,7 @@ def wf_use_local_configs(self): with open(nfconfig_fn, "w") as nfconfig_fh: nfconfig_fh.write(nfconfig) - def find_container_images(self): + def find_container_images(self, workflow_directory): """Find container image names for workflow. Starts by using `nextflow config` to pull out any process.container @@ -434,15 +664,23 @@ def find_container_images(self): 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' : 'biocontainers/fastqc:0.11.9--0' }" + Later DSL2, variable is being used: + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + "https://depot.galaxyproject.org/singularity/${container_id}" : + "quay.io/biocontainers/${container_id}" }" + + container_id = 'mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:afaaa4c6f5b308b4b6aa2dd8e99e1466b2a6b0cd-0' + DSL1 / Special case DSL2: container "nfcore/cellranger:6.0.2" """ log.debug("Fetching container names for workflow") - containers_raw = [] + # since this is run for multiple revisions now, account for previously detected containers. + containers_raw = [] if not self.containers else self.containers # Use linting code to parse the pipeline nextflow config - self.nf_config = nf_core.utils.fetch_wf_config(os.path.join(self.outdir, "workflow")) + self.nf_config = nf_core.utils.fetch_wf_config(workflow_directory) # Find any config variables that look like a container for k, v in self.nf_config.items(): @@ -450,7 +688,7 @@ def find_container_images(self): containers_raw.append(v.strip('"').strip("'")) # Recursive search through any DSL2 module files for container spec lines. - for subdir, _, files in os.walk(os.path.join(self.outdir, "workflow", "modules")): + for subdir, _, files in os.walk(os.path.join(workflow_directory, "modules")): for file in files: if file.endswith(".nf"): file_path = os.path.join(subdir, file) @@ -470,18 +708,54 @@ def find_container_images(self): break # Prioritise http, exit loop as soon as we find it # No https download, is the entire container string a docker URI? - else: - # Thanks Stack Overflow for the regex: https://stackoverflow.com/a/39672069/713980 - docker_regex = r"^(?:(?=[^:\/]{1,253})(?!-)[a-zA-Z0-9-]{1,63}(? 1 else ''}") - - def get_singularity_images(self): + def get_singularity_images(self, current_revision=""): """Loop through container names and download Singularity images""" if len(self.containers) == 0: log.info("No container names found in workflow") else: + log.info( + f"Processing workflow revision {current_revision}, found {len(self.containers)} container image{'s' if len(self.containers) > 1 else ''} in total." + ) + with DownloadProgress() as progress: - task = progress.add_task("all_containers", total=len(self.containers), progress_type="summary") + task = progress.add_task( + "Collecting container images", total=len(self.containers), progress_type="summary" + ) # Organise containers based on what we need to do with them containers_exist = [] @@ -520,8 +798,8 @@ def get_singularity_images(self): log.debug(f"Cache directory not found, creating: {cache_path_dir}") os.makedirs(cache_path_dir) - # We already have the target file in place, return - if os.path.exists(out_path): + # We already have the target file in place or in remote cache, return + if os.path.exists(out_path) or os.path.basename(out_path) in self.containers_remote: containers_exist.append(container) continue @@ -539,70 +817,81 @@ def get_singularity_images(self): containers_pull.append([container, out_path, cache_path]) # Exit if we need to pull images and Singularity is not installed - if len(containers_pull) > 0 and shutil.which("singularity") is None: - raise OSError("Singularity is needed to pull images, but it is not installed") - - # Go through each method of fetching containers in order - for container in containers_exist: - progress.update(task, description="Image file exists") - progress.update(task, advance=1) - - for container in containers_cache: - progress.update(task, description="Copying singularity images from cache") - self.singularity_copy_cache_image(*container) - progress.update(task, advance=1) - - with concurrent.futures.ThreadPoolExecutor(max_workers=self.parallel_downloads) as pool: - progress.update(task, description="Downloading singularity images") - - # Kick off concurrent downloads - future_downloads = [ - pool.submit(self.singularity_download_image, *container, progress) - for container in containers_download - ] - - # Make ctrl-c work with multi-threading - self.kill_with_fire = False - - try: - # Iterate over each threaded download, waiting for them to finish - for future in concurrent.futures.as_completed(future_downloads): - future.result() - try: - progress.update(task, advance=1) - except Exception as e: - log.error(f"Error updating progress bar: {e}") - - except KeyboardInterrupt: - # Cancel the future threads that haven't started yet - for future in future_downloads: - future.cancel() - # Set the variable that the threaded function looks for - # Will trigger an exception from each thread - self.kill_with_fire = True - # Re-raise exception on the main thread - raise - - for container in containers_pull: - progress.update(task, description="Pulling singularity images") - try: - self.singularity_pull_image(*container, progress) - except RuntimeWarning as r: - # Raise exception if this is not possible - log.error("Not able to pull image. Service might be down or internet connection is dead.") - raise r - progress.update(task, advance=1) + if len(containers_pull) > 0: + if not (shutil.which("singularity") or shutil.which("apptainer")): + raise OSError( + "Singularity/Apptainer is needed to pull images, but it is not installed or not in $PATH" + ) + + if containers_exist: + if self.singularity_cache_index is not None: + log.info( + f"{len(containers_exist)} containers are already cached remotely and won't be retrieved." + ) + # Go through each method of fetching containers in order + for container in containers_exist: + progress.update(task, description="Image file exists at destination") + progress.update(task, advance=1) + + if containers_cache: + for container in containers_cache: + progress.update(task, description="Copying singularity images from cache") + self.singularity_copy_cache_image(*container) + progress.update(task, advance=1) + + if containers_download or containers_pull: + # if clause gives slightly better UX, because Download is no longer displayed if nothing is left to be downloaded. + with concurrent.futures.ThreadPoolExecutor(max_workers=self.parallel_downloads) as pool: + progress.update(task, description="Downloading singularity images") + + # Kick off concurrent downloads + future_downloads = [ + pool.submit(self.singularity_download_image, *container, progress) + for container in containers_download + ] + + # Make ctrl-c work with multi-threading + self.kill_with_fire = False + + try: + # Iterate over each threaded download, waiting for them to finish + for future in concurrent.futures.as_completed(future_downloads): + future.result() + try: + progress.update(task, advance=1) + except Exception as e: + log.error(f"Error updating progress bar: {e}") + + except KeyboardInterrupt: + # Cancel the future threads that haven't started yet + for future in future_downloads: + future.cancel() + # Set the variable that the threaded function looks for + # Will trigger an exception from each thread + self.kill_with_fire = True + # Re-raise exception on the main thread + raise + + for container in containers_pull: + progress.update(task, description="Pulling singularity images") + try: + self.singularity_pull_image(*container, progress) + except RuntimeWarning as r: + # Raise exception if this is not possible + log.error("Not able to pull image. Service might be down or internet connection is dead.") + raise r + progress.update(task, advance=1) def singularity_image_filenames(self, container): """Check Singularity cache for image, copy to destination folder if found. Args: - container (str): A pipeline's container name. Can be direct download URL - or a Docker Hub repository ID. + container (str): A pipeline's container name. Can be direct download URL + or a Docker Hub repository ID. Returns: - results (bool, str): Returns True if we have the image in the target location. - Returns a download path if not. + results (bool, str): Returns True if we have the image in the target location. + Returns a download path if not. """ # Generate file paths @@ -630,11 +919,11 @@ def singularity_image_filenames(self, container): if os.environ.get("NXF_SINGULARITY_CACHEDIR"): cache_path = os.path.join(os.environ["NXF_SINGULARITY_CACHEDIR"], out_name) # Use only the cache - set this as the main output path - if self.singularity_cache_only: + if self.singularity_cache == "amend": out_path = cache_path cache_path = None - elif self.singularity_cache_only: - raise FileNotFoundError("'--singularity-cache' specified but no '$NXF_SINGULARITY_CACHEDIR' set!") + elif self.singularity_cache in ["amend", "copy"]: + raise FileNotFoundError("Singularity cache is required but no '$NXF_SINGULARITY_CACHEDIR' set!") return (out_path, cache_path) @@ -731,7 +1020,12 @@ def singularity_pull_image(self, container, out_path, cache_path, progress): # Pull using singularity address = f"docker://{container.replace('docker://', '')}" - singularity_command = ["singularity", "pull", "--name", output_path, address] + if shutil.which("singularity"): + singularity_command = ["singularity", "pull", "--name", output_path, address] + elif shutil.which("apptainer"): + singularity_command = ["apptainer", "pull", "--name", output_path, address] + else: + raise OSError("Singularity/Apptainer is needed to pull images, but it is not installed or not in $PATH") log.debug(f"Building singularity image: {address}") log.debug(f"Singularity command: {' '.join(singularity_command)}") @@ -754,7 +1048,7 @@ def singularity_pull_image(self, container, out_path, cache_path, progress): if lines: # something went wrong with the container retrieval if any("FATAL: " in line for line in lines): - log.info("Singularity container retrieval fialed with the following error:") + log.info("Singularity container retrieval failed with the following error:") log.info("".join(lines)) raise FileNotFoundError(f'The container "{container}" is unavailable.\n{"".join(lines)}') @@ -794,5 +1088,215 @@ def compress_download(self): log.debug(f"Deleting uncompressed files: '{self.outdir}'") shutil.rmtree(self.outdir) - # Caclualte md5sum for output file + # Calculate md5sum for output file log.info(f"MD5 checksum for '{self.output_filename}': [blue]{nf_core.utils.file_md5(self.output_filename)}[/]") + + +class WorkflowRepo(SyncedRepo): + """ + An object to store details about a locally cached workflow repository. + + Important Attributes: + fullname: The full name of the repository, ``nf-core/{self.pipelinename}``. + local_repo_dir (str): The local directory, where the workflow is cloned into. Defaults to ``$HOME/.cache/nf-core/nf-core/{self.pipeline}``. + + """ + + def __init__( + self, + remote_url, + revision, + commit, + location=None, + hide_progress=False, + in_cache=True, + ): + """ + Initializes the object and clones the workflows git repository if it is not already present + + Args: + remote_url (str): The URL of the remote repository. Defaults to None. + self.revision (list of str): The revisions to include. A list of strings. + commits (dict of str): The checksums to linked with the revisions. + no_pull (bool, optional): Whether to skip the pull step. Defaults to False. + hide_progress (bool, optional): Whether to hide the progress bar. Defaults to False. + in_cache (bool, optional): Whether to clone the repository from the cache. Defaults to False. + """ + self.remote_url = remote_url + if isinstance(revision, str): + self.revision = [revision] + elif isinstance(revision, list): + self.revision = [*revision] + else: + self.revision = [] + if isinstance(commit, str): + self.commit = [commit] + elif isinstance(commit, list): + self.commit = [*commit] + else: + self.commit = [] + self.fullname = nf_core.modules.modules_utils.repo_full_name_from_remote(self.remote_url) + self.retries = 0 # retries for setting up the locally cached repository + self.hide_progress = hide_progress + + self.setup_local_repo(remote=remote_url, location=location, in_cache=in_cache) + + # expose some instance attributes + self.tags = self.repo.tags + + def __repr__(self): + """Called by print, creates representation of object""" + return f"" + + def access(self): + if os.path.exists(self.local_repo_dir): + return self.local_repo_dir + else: + return None + + def checkout(self, commit): + return super().checkout(commit) + + def get_remote_branches(self, remote_url): + return super().get_remote_branches(remote_url) + + def retry_setup_local_repo(self, skip_confirm=False): + self.retries += 1 + if skip_confirm or rich.prompt.Confirm.ask( + f"[violet]Delete local cache '{self.local_repo_dir}' and try again?" + ): + if ( + self.retries > 1 + ): # One unconfirmed retry is acceptable, but prevent infinite loops without user interaction. + log.error( + f"Errors with locally cached repository of '{self.fullname}'. Please delete '{self.local_repo_dir}' manually and try again." + ) + sys.exit(1) + if not skip_confirm: # Feedback to user for manual confirmation. + log.info(f"Removing '{self.local_repo_dir}'") + shutil.rmtree(self.local_repo_dir) + self.setup_local_repo(self.remote_url, in_cache=False) + else: + raise LookupError("Exiting due to error with locally cached Git repository.") + + def setup_local_repo(self, remote, location=None, in_cache=True): + """ + Sets up the local git repository. If the repository has been cloned previously, it + returns a git.Repo object of that clone. Otherwise it tries to clone the repository from + the provided remote URL and returns a git.Repo of the new clone. + + Args: + remote (str): git url of remote + location (Path): location where the clone should be created/cached. + in_cache (bool, optional): Whether to clone the repository from the cache. Defaults to False. + Sets self.repo + """ + if location: + self.local_repo_dir = os.path.join(location, self.fullname) + else: + self.local_repo_dir = os.path.join(NFCORE_DIR if not in_cache else NFCORE_CACHE_DIR, self.fullname) + + try: + if not os.path.exists(self.local_repo_dir): + try: + pbar = rich.progress.Progress( + "[bold blue]{task.description}", + rich.progress.BarColumn(bar_width=None), + "[bold yellow]{task.fields[state]}", + transient=True, + disable=os.environ.get("HIDE_PROGRESS", None) is not None or self.hide_progress, + ) + with pbar: + self.repo = git.Repo.clone_from( + remote, + self.local_repo_dir, + progress=RemoteProgressbar(pbar, self.fullname, self.remote_url, "Cloning"), + ) + super().update_local_repo_status(self.fullname, True) + except GitCommandError: + raise LookupError(f"Failed to clone from the remote: `{remote}`") + else: + self.repo = git.Repo(self.local_repo_dir) + + if super().no_pull_global: + super().update_local_repo_status(self.fullname, True) + # If the repo is already cloned, fetch the latest changes from the remote + if not super().local_repo_synced(self.fullname): + pbar = rich.progress.Progress( + "[bold blue]{task.description}", + rich.progress.BarColumn(bar_width=None), + "[bold yellow]{task.fields[state]}", + transient=True, + disable=os.environ.get("HIDE_PROGRESS", None) is not None or self.hide_progress, + ) + with pbar: + self.repo.remotes.origin.fetch( + progress=RemoteProgressbar(pbar, self.fullname, self.remote_url, "Pulling") + ) + super().update_local_repo_status(self.fullname, True) + + except (GitCommandError, InvalidGitRepositoryError) as e: + log.error(f"[red]Could not set up local cache of modules repository:[/]\n{e}\n") + self.retry_setup_local_repo() + + def tidy_tags_and_branches(self): + """ + Function to delete all tags and branches that are not of interest to the downloader. + This allows a clutter-free experience in Tower. The untagged commits are evidently still available. + + However, due to local caching, the downloader might also want access to revisions that had been deleted before. + In that case, don't bother with re-adding the tags and rather download anew from Github. + """ + if self.revision and self.repo and self.repo.tags: + # create a set to keep track of the revisions to process & check + desired_revisions = set(self.revision) + + # determine what needs pruning + tags_to_remove = {tag for tag in self.repo.tags if tag.name not in desired_revisions} + heads_to_remove = {head for head in self.repo.heads if head.name not in desired_revisions} + + try: + # delete unwanted tags from repository + for tag in tags_to_remove: + self.repo.delete_tag(tag) + self.tags = self.repo.tags + + # switch to a revision that should be kept, because deleting heads fails, if they are checked out (e.g. "master") + self.checkout(self.revision[0]) + + # delete unwanted heads/branches from repository + for head in heads_to_remove: + self.repo.delete_head(head) + + # ensure all desired branches are available + for revision in desired_revisions: + self.checkout(revision) + self.heads = self.repo.heads + + # get all tags and available remote_branches + completed_revisions = {revision.name for revision in self.repo.heads + self.repo.tags} + + # verify that all requested revisions are available. + # a local cache might lack revisions that were deleted during a less comprehensive previous download. + if bool(desired_revisions - completed_revisions): + log.info( + f"Locally cached version of the pipeline lacks selected revisions {', '.join(desired_revisions - completed_revisions)}. Downloading anew from GitHub..." + ) + self.retry_setup_local_repo(skip_confirm=True) + self.tidy_tags_and_branches() + except (GitCommandError, InvalidGitRepositoryError) as e: + log.error(f"[red]Adapting your pipeline download unfortunately failed:[/]\n{e}\n") + self.retry_setup_local_repo(skip_confirm=True) + sys.exit(1) + + def bare_clone(self, destination): + if self.repo: + try: + destfolder = os.path.abspath(destination) + if not os.path.exists(destfolder): + os.makedirs(destfolder) + if os.path.exists(destination): + shutil.rmtree(os.path.abspath(destination)) + self.repo.clone(os.path.abspath(destination), bare=True) + except (OSError, GitCommandError, InvalidGitRepositoryError) as e: + log.error(f"[red]Failure to create the pipeline download[/]\n{e}\n") diff --git a/nf_core/lint/files_exist.py b/nf_core/lint/files_exist.py index eb8c04916a..02baae7db8 100644 --- a/nf_core/lint/files_exist.py +++ b/nf_core/lint/files_exist.py @@ -52,7 +52,6 @@ def files_exist(self): docs/README.md docs/usage.md lib/nfcore_external_java_deps.jar - lib/NfcoreSchema.groovy lib/NfcoreTemplate.groovy lib/Utils.groovy lib/WorkflowMain.groovy @@ -161,7 +160,6 @@ def files_exist(self): [os.path.join("docs", "README.md")], [os.path.join("docs", "usage.md")], [os.path.join("lib", "nfcore_external_java_deps.jar")], - [os.path.join("lib", "NfcoreSchema.groovy")], [os.path.join("lib", "NfcoreTemplate.groovy")], [os.path.join("lib", "Utils.groovy")], [os.path.join("lib", "WorkflowMain.groovy")], diff --git a/nf_core/lint/files_unchanged.py b/nf_core/lint/files_unchanged.py index c0be64d0d7..2b64d62638 100644 --- a/nf_core/lint/files_unchanged.py +++ b/nf_core/lint/files_unchanged.py @@ -40,7 +40,6 @@ def files_unchanged(self): docs/images/nf-core-PIPELINE_logo_dark.png docs/README.md' lib/nfcore_external_java_deps.jar - lib/NfcoreSchema.groovy lib/NfcoreTemplate.groovy ['LICENSE', 'LICENSE.md', 'LICENCE', 'LICENCE.md'], # NB: British / American spelling @@ -105,7 +104,6 @@ def files_unchanged(self): [os.path.join("docs", "images", f"nf-core-{short_name}_logo_dark.png")], [os.path.join("docs", "README.md")], [os.path.join("lib", "nfcore_external_java_deps.jar")], - [os.path.join("lib", "NfcoreSchema.groovy")], [os.path.join("lib", "NfcoreTemplate.groovy")], ] files_partial = [ diff --git a/nf_core/lint/multiqc_config.py b/nf_core/lint/multiqc_config.py index 3378efce5f..9eff60091f 100644 --- a/nf_core/lint/multiqc_config.py +++ b/nf_core/lint/multiqc_config.py @@ -71,12 +71,13 @@ def multiqc_config(self): if "report_comment" not in ignore_configs: # Check that the minimum plugins exist and are coming first in the summary try: + version = self.nf_config.get("manifest.version", "").strip(" '\"") if "report_comment" not in mqc_yml: raise AssertionError() if mqc_yml["report_comment"].strip() != ( - f'This report has been generated by the nf-core/{self.pipeline_name} analysis pipeline. For information about how to ' - f'interpret these results, please see the documentation.' ): raise AssertionError() diff --git a/nf_core/lint/nextflow_config.py b/nf_core/lint/nextflow_config.py index af018331f0..f317410cdd 100644 --- a/nf_core/lint/nextflow_config.py +++ b/nf_core/lint/nextflow_config.py @@ -62,11 +62,11 @@ def nextflow_config(self): * Should always be set to default value: ``https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}`` - * ``params.show_hidden_params`` + * ``params.validationShowHiddenParams`` * Determines whether boilerplate params are showed by schema. Set to ``false`` by default - * ``params.schema_ignore_params`` + * ``params.validationSchemaIgnoreParams`` * A comma separated string of inputs the schema validation should ignore. @@ -130,8 +130,6 @@ def nextflow_config(self): ["process.time"], ["params.outdir"], ["params.input"], - ["params.show_hidden_params"], - ["params.schema_ignore_params"], ] # Throw a warning if these are missing config_warn = [ diff --git a/nf_core/modules/lint/main_nf.py b/nf_core/modules/lint/main_nf.py index 8150e7e839..31b8adca3a 100644 --- a/nf_core/modules/lint/main_nf.py +++ b/nf_core/modules/lint/main_nf.py @@ -283,7 +283,7 @@ def check_process_section(self, lines, fix_version, progress_bar): self.failed.append(("docker_tag", "Unable to parse docker tag", self.main_nf)) docker_tag = NoneD if l.startswith("quay.io/"): - l_stripped = re.sub("\W+$", "", l) + l_stripped = re.sub(r"\W+$", "", l) self.failed.append( ( "container_links", diff --git a/nf_core/modules/modules_repo.py b/nf_core/modules/modules_repo.py index 5f77148867..152ed7b0c0 100644 --- a/nf_core/modules/modules_repo.py +++ b/nf_core/modules/modules_repo.py @@ -11,7 +11,8 @@ import nf_core.modules.modules_json import nf_core.modules.modules_utils -from nf_core.utils import NFCORE_DIR, load_tools_config +from nf_core.synced_repo import RemoteProgressbar, SyncedRepo +from nf_core.utils import NFCORE_CACHE_DIR, NFCORE_DIR, load_tools_config log = logging.getLogger(__name__) @@ -21,44 +22,7 @@ NF_CORE_MODULES_DEFAULT_BRANCH = "master" -class RemoteProgressbar(git.RemoteProgress): - """ - An object to create a progressbar for when doing an operation with the remote. - Note that an initialized rich Progress (progress bar) object must be past - during initialization. - """ - - def __init__(self, progress_bar, repo_name, remote_url, operation): - """ - Initializes the object and adds a task to the progressbar passed as 'progress_bar' - - Args: - progress_bar (rich.progress.Progress): A rich progress bar object - repo_name (str): Name of the repository the operation is performed on - remote_url (str): Git URL of the repository the operation is performed on - operation (str): The operation performed on the repository, i.e. 'Pulling', 'Cloning' etc. - """ - super().__init__() - self.progress_bar = progress_bar - self.tid = self.progress_bar.add_task( - f"{operation} from [bold green]'{repo_name}'[/bold green] ([link={remote_url}]{remote_url}[/link])", - start=False, - state="Waiting for response", - ) - - def update(self, op_code, cur_count, max_count=None, message=""): - """ - Overrides git.RemoteProgress.update. - Called every time there is a change in the remote operation - """ - if not self.progress_bar.tasks[self.tid].started: - self.progress_bar.start_task(self.tid) - self.progress_bar.update( - self.tid, total=max_count, completed=cur_count, state=f"{cur_count / max_count * 100:.1f}%" - ) - - -class ModulesRepo: +class ModulesRepo(SyncedRepo): """ An object to store details about the repository being used for modules. @@ -73,45 +37,6 @@ class ModulesRepo: local_repo_statuses = {} no_pull_global = False - @staticmethod - def local_repo_synced(repo_name): - """ - Checks whether a local repo has been cloned/pull in the current session - """ - return ModulesRepo.local_repo_statuses.get(repo_name, False) - - @staticmethod - def update_local_repo_status(repo_name, up_to_date): - """ - Updates the clone/pull status of a local repo - """ - ModulesRepo.local_repo_statuses[repo_name] = up_to_date - - @staticmethod - def get_remote_branches(remote_url): - """ - Get all branches from a remote repository - - Args: - remote_url (str): The git url to the remote repository - - Returns: - (set[str]): All branches found in the remote - """ - try: - unparsed_branches = git.Git().ls_remote(remote_url) - except git.GitCommandError: - raise LookupError(f"Was unable to fetch branches from '{remote_url}'") - else: - branches = {} - for branch_info in unparsed_branches.split("\n"): - sha, name = branch_info.split("\t") - if name != "HEAD": - # The remote branches are shown as 'ref/head/branch' - branch_name = Path(name).stem - branches[sha] = branch_name - return set(branches.values()) - def __init__(self, remote_url=None, branch=None, no_pull=False, hide_progress=False): """ Initializes the object and clones the git repository if it is not already present @@ -146,27 +71,7 @@ def __init__(self, remote_url=None, branch=None, no_pull=False, hide_progress=Fa self.avail_module_names = None - def verify_sha(self, prompt, sha): - """ - Verify that 'sha' and 'prompt' arguments are not provided together. - Verify that the provided SHA exists in the repo. - - Arguments: - prompt (bool): prompt asking for SHA - sha (str): provided sha - """ - if prompt and sha is not None: - log.error("Cannot use '--sha' and '--prompt' at the same time!") - return False - - if sha: - if not self.sha_exists_on_branch(sha): - log.error(f"Commit SHA '{sha}' doesn't exist in '{self.remote_url}'") - return False - - return True - - def setup_local_repo(self, remote, branch, hide_progress=True): + def setup_local_repo(self, remote, branch, hide_progress=True, in_cache=False): """ Sets up the local git repository. If the repository has been cloned previously, it returns a git.Repo object of that clone. Otherwise it tries to clone the repository from @@ -177,7 +82,7 @@ def setup_local_repo(self, remote, branch, hide_progress=True): branch (str): name of branch to use Sets self.repo """ - self.local_repo_dir = os.path.join(NFCORE_DIR, self.fullname) + self.local_repo_dir = os.path.join(NFCORE_DIR if not in_cache else NFCORE_CACHE_DIR, self.fullname) try: if not os.path.exists(self.local_repo_dir): try: @@ -236,263 +141,3 @@ def setup_local_repo(self, remote, branch, hide_progress=True): self.setup_local_repo(remote, branch, hide_progress) else: raise LookupError("Exiting due to error with local modules git repo") - - def setup_branch(self, branch): - """ - Verify that we have a branch and otherwise use the default one. - The branch is then checked out to verify that it exists in the repo. - - Args: - branch (str): Name of branch - """ - if branch is None: - # Don't bother fetching default branch if we're using nf-core - if self.remote_url == NF_CORE_MODULES_REMOTE: - self.branch = "master" - else: - self.branch = self.get_default_branch() - else: - self.branch = branch - - # Verify that the branch exists by checking it out - self.branch_exists() - - def get_default_branch(self): - """ - Gets the default branch for the repo (the branch origin/HEAD is pointing to) - """ - origin_head = next(ref for ref in self.repo.refs if ref.name == "origin/HEAD") - _, branch = origin_head.ref.name.split("/") - return branch - - def branch_exists(self): - """ - Verifies that the branch exists in the repository by trying to check it out - """ - try: - self.checkout_branch() - except GitCommandError: - raise LookupError(f"Branch '{self.branch}' not found in '{self.remote_url}'") - - def verify_branch(self): - """ - Verifies the active branch conforms do the correct directory structure - """ - dir_names = os.listdir(self.local_repo_dir) - if "modules" not in dir_names: - err_str = f"Repository '{self.remote_url}' ({self.branch}) does not contain the 'modules/' directory" - if "software" in dir_names: - err_str += ( - ".\nAs of nf-core/tools version 2.0, the 'software/' directory should be renamed to 'modules/'" - ) - raise LookupError(err_str) - - def checkout_branch(self): - """ - Checks out the specified branch of the repository - """ - self.repo.git.checkout(self.branch) - - def checkout(self, commit): - """ - Checks out the repository at the requested commit - - Args: - commit (str): Git SHA of the commit - """ - self.repo.git.checkout(commit) - - def component_exists(self, component_name, component_type, checkout=True, commit=None): - """ - Check if a module/subworkflow exists in the branch of the repo - - Args: - component_name (str): The name of the module/subworkflow - - Returns: - (bool): Whether the module/subworkflow exists in this branch of the repository - """ - return component_name in self.get_avail_components(component_type, checkout=checkout, commit=commit) - - def get_component_dir(self, component_name, component_type): - """ - Returns the file path of a module/subworkflow directory in the repo. - Does not verify that the path exists. - Args: - component_name (str): The name of the module/subworkflow - - Returns: - component_path (str): The path of the module/subworkflow in the local copy of the repository - """ - if component_type == "modules": - return os.path.join(self.modules_dir, component_name) - elif component_type == "subworkflows": - return os.path.join(self.subworkflows_dir, component_name) - - def install_component(self, component_name, install_dir, commit, component_type): - """ - Install the module/subworkflow files into a pipeline at the given commit - - Args: - component_name (str): The name of the module/subworkflow - install_dir (str): The path where the module/subworkflow should be installed - commit (str): The git SHA for the version of the module/subworkflow to be installed - - Returns: - (bool): Whether the operation was successful or not - """ - # Check out the repository at the requested ref - try: - self.checkout(commit) - except git.GitCommandError: - return False - - # Check if the module/subworkflow exists in the branch - if not self.component_exists(component_name, component_type, checkout=False): - log.error( - f"The requested {component_type[:-1]} does not exists in the branch '{self.branch}' of {self.remote_url}'" - ) - return False - - # Copy the files from the repo to the install folder - shutil.copytree(self.get_component_dir(component_name, component_type), Path(install_dir, component_name)) - - # Switch back to the tip of the branch - self.checkout_branch() - return True - - def module_files_identical(self, module_name, base_path, commit): - """ - Checks whether the module files in a pipeline are identical to the ones in the remote - Args: - module_name (str): The name of the module - base_path (str): The path to the module in the pipeline - - Returns: - (bool): Whether the pipeline files are identical to the repo files - """ - if commit is None: - self.checkout_branch() - else: - self.checkout(commit) - module_files = ["main.nf", "meta.yml"] - files_identical = {file: True for file in module_files} - module_dir = self.get_component_dir(module_name, "modules") - for file in module_files: - try: - files_identical[file] = filecmp.cmp(os.path.join(module_dir, file), os.path.join(base_path, file)) - except FileNotFoundError: - log.debug(f"Could not open file: {os.path.join(module_dir, file)}") - continue - self.checkout_branch() - return files_identical - - def get_component_git_log(self, component_name, component_type, depth=None): - """ - Fetches the commit history the of requested module/subworkflow since a given date. The default value is - not arbitrary - it is the last time the structure of the nf-core/modules repository was had an - update breaking backwards compatibility. - Args: - component_name (str): Name of module/subworkflow - modules_repo (ModulesRepo): A ModulesRepo object configured for the repository in question - - Returns: - ( dict ): Iterator of commit SHAs and associated (truncated) message - """ - self.checkout_branch() - component_path = os.path.join(component_type, self.repo_path, component_name) - commits_new = self.repo.iter_commits(max_count=depth, paths=component_path) - commits_new = [ - {"git_sha": commit.hexsha, "trunc_message": commit.message.partition("\n")[0]} for commit in commits_new - ] - commits_old = [] - if component_type == "modules": - # Grab commits also from previous modules structure - component_path = os.path.join("modules", component_name) - commits_old = self.repo.iter_commits(max_count=depth, paths=component_path) - commits_old = [ - {"git_sha": commit.hexsha, "trunc_message": commit.message.partition("\n")[0]} for commit in commits_old - ] - commits = iter(commits_new + commits_old) - return commits - - def get_latest_component_version(self, component_name, component_type): - """ - Returns the latest commit in the repository - """ - return list(self.get_component_git_log(component_name, component_type, depth=1))[0]["git_sha"] - - def sha_exists_on_branch(self, sha): - """ - Verifies that a given commit sha exists on the branch - """ - self.checkout_branch() - return sha in (commit.hexsha for commit in self.repo.iter_commits()) - - def get_commit_info(self, sha): - """ - Fetches metadata about the commit (dates, message, etc.) - Args: - commit_sha (str): The SHA of the requested commit - Returns: - message (str): The commit message for the requested commit - date (str): The commit date for the requested commit - Raises: - LookupError: If the search for the commit fails - """ - self.checkout_branch() - for commit in self.repo.iter_commits(): - if commit.hexsha == sha: - message = commit.message.partition("\n")[0] - date_obj = commit.committed_datetime - date = str(date_obj.date()) - return message, date - raise LookupError(f"Commit '{sha}' not found in the '{self.remote_url}'") - - def get_avail_components(self, component_type, checkout=True, commit=None): - """ - Gets the names of the modules/subworkflows in the repository. They are detected by - checking which directories have a 'main.nf' file - - Returns: - ([ str ]): The module/subworkflow names - """ - if checkout: - self.checkout_branch() - if commit is not None: - self.checkout(commit) - # Get directory - if component_type == "modules": - directory = self.modules_dir - elif component_type == "subworkflows": - directory = self.subworkflows_dir - # Module/Subworkflow directories are characterized by having a 'main.nf' file - avail_component_names = [ - os.path.relpath(dirpath, start=directory) - for dirpath, _, file_names in os.walk(directory) - if "main.nf" in file_names - ] - return avail_component_names - - def get_meta_yml(self, component_type, module_name): - """ - Returns the contents of the 'meta.yml' file of a module - - Args: - module_name (str): The name of the module - - Returns: - (str): The contents of the file in text format - """ - self.checkout_branch() - if component_type == "modules": - path = Path(self.modules_dir, module_name, "meta.yml") - elif component_type == "subworkflows": - path = Path(self.subworkflows_dir, module_name, "meta.yml") - else: - raise ValueError(f"Invalid component type: {component_type}") - if not path.exists(): - return None - with open(path) as fh: - contents = fh.read() - return contents diff --git a/nf_core/pipeline-template/.github/CONTRIBUTING.md b/nf_core/pipeline-template/.github/CONTRIBUTING.md index 9afdd2987b..ecdda0f86b 100644 --- a/nf_core/pipeline-template/.github/CONTRIBUTING.md +++ b/nf_core/pipeline-template/.github/CONTRIBUTING.md @@ -124,4 +124,3 @@ To get started: Devcontainer specs: - [DevContainer config](.devcontainer/devcontainer.json) -- [Dockerfile](.devcontainer/Dockerfile) diff --git a/nf_core/pipeline-template/CITATIONS.md b/nf_core/pipeline-template/CITATIONS.md index 740b045103..ceaba0cb5f 100644 --- a/nf_core/pipeline-template/CITATIONS.md +++ b/nf_core/pipeline-template/CITATIONS.md @@ -12,7 +12,10 @@ - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) + > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. + - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. ## Software packaging/containerisation tools @@ -31,5 +34,8 @@ - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) + > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. + - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) + > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/nf_core/pipeline-template/assets/multiqc_config.yml b/nf_core/pipeline-template/assets/multiqc_config.yml index 440b0b9a3a..570ed3d8e5 100644 --- a/nf_core/pipeline-template/assets/multiqc_config.yml +++ b/nf_core/pipeline-template/assets/multiqc_config.yml @@ -1,7 +1,7 @@ report_comment: > - This report has been generated by the {{ name }} + This report has been generated by the {{ name }} analysis pipeline.{% if branded %} For information about how to interpret these results, please see the - documentation.{% endif %} + documentation.{% endif %} report_section_order: "{{ name_noslash }}-methods-description": order: -1000 diff --git a/nf_core/pipeline-template/assets/schema_input.json b/nf_core/pipeline-template/assets/schema_input.json index 509048bd8a..10329ebb63 100644 --- a/nf_core/pipeline-template/assets/schema_input.json +++ b/nf_core/pipeline-template/assets/schema_input.json @@ -10,7 +10,8 @@ "sample": { "type": "string", "pattern": "^\\S+$", - "errorMessage": "Sample name must be provided and cannot contain spaces" + "errorMessage": "Sample name must be provided and cannot contain spaces", + "meta": ["id"] }, "fastq_1": { "type": "string", diff --git a/nf_core/pipeline-template/bin/check_samplesheet.py b/nf_core/pipeline-template/bin/check_samplesheet.py deleted file mode 100755 index 4a758fe003..0000000000 --- a/nf_core/pipeline-template/bin/check_samplesheet.py +++ /dev/null @@ -1,259 +0,0 @@ -#!/usr/bin/env python - - -"""Provide a command line tool to validate and transform tabular samplesheets.""" - - -import argparse -import csv -import logging -import sys -from collections import Counter -from pathlib import Path - -logger = logging.getLogger() - - -class RowChecker: - """ - Define a service that can validate and transform each given row. - - Attributes: - modified (list): A list of dicts, where each dict corresponds to a previously - validated and transformed row. The order of rows is maintained. - - """ - - VALID_FORMATS = ( - ".fq.gz", - ".fastq.gz", - ) - - def __init__( - self, - sample_col="sample", - first_col="fastq_1", - second_col="fastq_2", - single_col="single_end", - **kwargs, - ): - """ - Initialize the row checker with the expected column names. - - Args: - sample_col (str): The name of the column that contains the sample name - (default "sample"). - first_col (str): The name of the column that contains the first (or only) - FASTQ file path (default "fastq_1"). - second_col (str): The name of the column that contains the second (if any) - FASTQ file path (default "fastq_2"). - single_col (str): The name of the new column that will be inserted and - records whether the sample contains single- or paired-end sequencing - reads (default "single_end"). - - """ - super().__init__(**kwargs) - self._sample_col = sample_col - self._first_col = first_col - self._second_col = second_col - self._single_col = single_col - self._seen = set() - self.modified = [] - - def validate_and_transform(self, row): - """ - Perform all validations on the given row and insert the read pairing status. - - Args: - row (dict): A mapping from column headers (keys) to elements of that row - (values). - - """ - self._validate_sample(row) - self._validate_first(row) - self._validate_second(row) - self._validate_pair(row) - self._seen.add((row[self._sample_col], row[self._first_col])) - self.modified.append(row) - - def _validate_sample(self, row): - """Assert that the sample name exists and convert spaces to underscores.""" - if len(row[self._sample_col]) <= 0: - raise AssertionError("Sample input is required.") - # Sanitize samples slightly. - row[self._sample_col] = row[self._sample_col].replace(" ", "_") - - def _validate_first(self, row): - """Assert that the first FASTQ entry is non-empty and has the right format.""" - if len(row[self._first_col]) <= 0: - raise AssertionError("At least the first FASTQ file is required.") - self._validate_fastq_format(row[self._first_col]) - - def _validate_second(self, row): - """Assert that the second FASTQ entry has the right format if it exists.""" - if len(row[self._second_col]) > 0: - self._validate_fastq_format(row[self._second_col]) - - def _validate_pair(self, row): - """Assert that read pairs have the same file extension. Report pair status.""" - if row[self._first_col] and row[self._second_col]: - row[self._single_col] = False - first_col_suffix = Path(row[self._first_col]).suffixes[-2:] - second_col_suffix = Path(row[self._second_col]).suffixes[-2:] - if first_col_suffix != second_col_suffix: - raise AssertionError("FASTQ pairs must have the same file extensions.") - else: - row[self._single_col] = True - - def _validate_fastq_format(self, filename): - """Assert that a given filename has one of the expected FASTQ extensions.""" - if not any(filename.endswith(extension) for extension in self.VALID_FORMATS): - raise AssertionError( - f"The FASTQ file has an unrecognized extension: {filename}\n" - f"It should be one of: {', '.join(self.VALID_FORMATS)}" - ) - - def validate_unique_samples(self): - """ - Assert that the combination of sample name and FASTQ filename is unique. - - In addition to the validation, also rename all samples to have a suffix of _T{n}, where n is the - number of times the same sample exist, but with different FASTQ files, e.g., multiple runs per experiment. - - """ - if len(self._seen) != len(self.modified): - raise AssertionError("The pair of sample name and FASTQ must be unique.") - seen = Counter() - for row in self.modified: - sample = row[self._sample_col] - seen[sample] += 1 - row[self._sample_col] = f"{sample}_T{seen[sample]}" - - -def read_head(handle, num_lines=10): - """Read the specified number of lines from the current position in the file.""" - lines = [] - for idx, line in enumerate(handle): - if idx == num_lines: - break - lines.append(line) - return "".join(lines) - - -def sniff_format(handle): - """ - Detect the tabular format. - - Args: - handle (text file): A handle to a `text file`_ object. The read position is - expected to be at the beginning (index 0). - - Returns: - csv.Dialect: The detected tabular format. - - .. _text file: - https://docs.python.org/3/glossary.html#term-text-file - - """ - peek = read_head(handle) - handle.seek(0) - sniffer = csv.Sniffer() - dialect = sniffer.sniff(peek) - return dialect - - -def check_samplesheet(file_in, file_out): - """ - Check that the tabular samplesheet has the structure expected by nf-core pipelines. - - Validate the general shape of the table, expected columns, and each row. Also add - an additional column which records whether one or two FASTQ reads were found. - - Args: - file_in (pathlib.Path): The given tabular samplesheet. The format can be either - CSV, TSV, or any other format automatically recognized by ``csv.Sniffer``. - file_out (pathlib.Path): Where the validated and transformed samplesheet should - be created; always in CSV format. - - Example: - This function checks that the samplesheet follows the following structure, - see also the `viral recon samplesheet`_:: - - sample,fastq_1,fastq_2 - SAMPLE_PE,SAMPLE_PE_RUN1_1.fastq.gz,SAMPLE_PE_RUN1_2.fastq.gz - SAMPLE_PE,SAMPLE_PE_RUN2_1.fastq.gz,SAMPLE_PE_RUN2_2.fastq.gz - SAMPLE_SE,SAMPLE_SE_RUN1_1.fastq.gz, - - .. _viral recon samplesheet: - https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv - - """ - required_columns = {"sample", "fastq_1", "fastq_2"} - # See https://docs.python.org/3.9/library/csv.html#id3 to read up on `newline=""`. - with file_in.open(newline="") as in_handle: - reader = csv.DictReader(in_handle, dialect=sniff_format(in_handle)) - # Validate the existence of the expected header columns. - if not required_columns.issubset(reader.fieldnames): - req_cols = ", ".join(required_columns) - logger.critical(f"The sample sheet **must** contain these column headers: {req_cols}.") - sys.exit(1) - # Validate each row. - checker = RowChecker() - for i, row in enumerate(reader): - try: - checker.validate_and_transform(row) - except AssertionError as error: - logger.critical(f"{str(error)} On line {i + 2}.") - sys.exit(1) - checker.validate_unique_samples() - header = list(reader.fieldnames) - header.insert(1, "single_end") - # See https://docs.python.org/3.9/library/csv.html#id3 to read up on `newline=""`. - with file_out.open(mode="w", newline="") as out_handle: - writer = csv.DictWriter(out_handle, header, delimiter=",") - writer.writeheader() - for row in checker.modified: - writer.writerow(row) - - -def parse_args(argv=None): - """Define and immediately parse command line arguments.""" - parser = argparse.ArgumentParser( - description="Validate and transform a tabular samplesheet.", - epilog="Example: python check_samplesheet.py samplesheet.csv samplesheet.valid.csv", - ) - parser.add_argument( - "file_in", - metavar="FILE_IN", - type=Path, - help="Tabular input samplesheet in CSV or TSV format.", - ) - parser.add_argument( - "file_out", - metavar="FILE_OUT", - type=Path, - help="Transformed output samplesheet in CSV format.", - ) - parser.add_argument( - "-l", - "--log-level", - help="The desired log level (default WARNING).", - choices=("CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG"), - default="WARNING", - ) - return parser.parse_args(argv) - - -def main(argv=None): - """Coordinate argument parsing and program execution.""" - args = parse_args(argv) - logging.basicConfig(level=args.log_level, format="[%(levelname)s] %(message)s") - if not args.file_in.is_file(): - logger.error(f"The given input file {args.file_in} was not found!") - sys.exit(2) - args.file_out.parent.mkdir(parents=True, exist_ok=True) - check_samplesheet(args.file_in, args.file_out) - - -if __name__ == "__main__": - sys.exit(main()) diff --git a/nf_core/pipeline-template/conf/modules.config b/nf_core/pipeline-template/conf/modules.config index da58a5d881..2cb3b13049 100644 --- a/nf_core/pipeline-template/conf/modules.config +++ b/nf_core/pipeline-template/conf/modules.config @@ -18,14 +18,6 @@ process { saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] - withName: SAMPLESHEET_CHECK { - publishDir = [ - path: { "${params.outdir}/pipeline_info" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } - withName: FASTQC { ext.args = '--quiet' } diff --git a/nf_core/pipeline-template/lib/NfcoreSchema.groovy b/nf_core/pipeline-template/lib/NfcoreSchema.groovy deleted file mode 100755 index 9b34804d6d..0000000000 --- a/nf_core/pipeline-template/lib/NfcoreSchema.groovy +++ /dev/null @@ -1,530 +0,0 @@ -// -// This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template. -// - -import nextflow.Nextflow -import org.everit.json.schema.Schema -import org.everit.json.schema.loader.SchemaLoader -import org.everit.json.schema.ValidationException -import org.json.JSONObject -import org.json.JSONTokener -import org.json.JSONArray -import groovy.json.JsonSlurper -import groovy.json.JsonBuilder - -class NfcoreSchema { - - // - // Resolve Schema path relative to main workflow directory - // - public static String getSchemaPath(workflow, schema_filename='nextflow_schema.json') { - return "${workflow.projectDir}/${schema_filename}" - } - - // - // Function to loop over all parameters defined in schema and check - // whether the given parameters adhere to the specifications - // - /* groovylint-disable-next-line UnusedPrivateMethodParameter */ - public static void validateParameters(workflow, params, log, schema_filename='nextflow_schema.json') { - def has_error = false - //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// - // Check for nextflow core params and unexpected params - def json = new File(getSchemaPath(workflow, schema_filename=schema_filename)).text - def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions') - def nf_params = [ - // Options for base `nextflow` command - 'bg', - 'c', - 'C', - 'config', - 'd', - 'D', - 'dockerize', - 'h', - 'log', - 'q', - 'quiet', - 'syslog', - 'v', - - // Options for `nextflow run` command - 'ansi', - 'ansi-log', - 'bg', - 'bucket-dir', - 'c', - 'cache', - 'config', - 'dsl2', - 'dump-channels', - 'dump-hashes', - 'E', - 'entry', - 'latest', - 'lib', - 'main-script', - 'N', - 'name', - 'offline', - 'params-file', - 'pi', - 'plugins', - 'poll-interval', - 'pool-size', - 'profile', - 'ps', - 'qs', - 'queue-size', - 'r', - 'resume', - 'revision', - 'stdin', - 'stub', - 'stub-run', - 'test', - 'w', - 'with-apptainer', - 'with-charliecloud', - 'with-conda', - 'with-dag', - 'with-docker', - 'with-mpi', - 'with-notification', - 'with-podman', - 'with-report', - 'with-singularity', - 'with-timeline', - 'with-tower', - 'with-trace', - 'with-weblog', - 'without-docker', - 'without-podman', - 'work-dir' - ] - def unexpectedParams = [] - - // Collect expected parameters from the schema - def expectedParams = [] - def enums = [:] - for (group in schemaParams) { - for (p in group.value['properties']) { - expectedParams.push(p.key) - if (group.value['properties'][p.key].containsKey('enum')) { - enums[p.key] = group.value['properties'][p.key]['enum'] - } - } - } - - for (specifiedParam in params.keySet()) { - // nextflow params - if (nf_params.contains(specifiedParam)) { - log.error "ERROR: You used a core Nextflow option with two hyphens: '--${specifiedParam}'. Please resubmit with '-${specifiedParam}'" - has_error = true - } - // unexpected params - def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params' - def expectedParamsLowerCase = expectedParams.collect{ it.replace("-", "").toLowerCase() } - def specifiedParamLowerCase = specifiedParam.replace("-", "").toLowerCase() - def isCamelCaseBug = (specifiedParam.contains("-") && !expectedParams.contains(specifiedParam) && expectedParamsLowerCase.contains(specifiedParamLowerCase)) - if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !isCamelCaseBug) { - // Temporarily remove camelCase/camel-case params #1035 - def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace("-", "").toLowerCase()} - if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){ - unexpectedParams.push(specifiedParam) - } - } - } - - //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// - // Validate parameters against the schema - InputStream input_stream = new File(getSchemaPath(workflow, schema_filename=schema_filename)).newInputStream() - JSONObject raw_schema = new JSONObject(new JSONTokener(input_stream)) - - // Remove anything that's in params.schema_ignore_params - raw_schema = removeIgnoredParams(raw_schema, params) - - Schema schema = SchemaLoader.load(raw_schema) - - // Clean the parameters - def cleanedParams = cleanParameters(params) - - // Convert to JSONObject - def jsonParams = new JsonBuilder(cleanedParams) - JSONObject params_json = new JSONObject(jsonParams.toString()) - - // Validate - try { - schema.validate(params_json) - } catch (ValidationException e) { - println '' - log.error 'ERROR: Validation of pipeline parameters failed!' - JSONObject exceptionJSON = e.toJSON() - printExceptions(exceptionJSON, params_json, log, enums) - println '' - has_error = true - } - - // Check for unexpected parameters - if (unexpectedParams.size() > 0) { - Map colors = NfcoreTemplate.logColours(params.monochrome_logs) - println '' - def warn_msg = 'Found unexpected parameters:' - for (unexpectedParam in unexpectedParams) { - warn_msg = warn_msg + "\n* --${unexpectedParam}: ${params[unexpectedParam].toString()}" - } - log.warn warn_msg - log.info "- ${colors.dim}Ignore this warning: params.schema_ignore_params = \"${unexpectedParams.join(',')}\" ${colors.reset}" - println '' - } - - if (has_error) { - Nextflow.error('Exiting!') - } - } - - // - // Beautify parameters for --help - // - public static String paramsHelp(workflow, params, command, schema_filename='nextflow_schema.json') { - Map colors = NfcoreTemplate.logColours(params.monochrome_logs) - Integer num_hidden = 0 - String output = '' - output += 'Typical pipeline command:\n\n' - output += " ${colors.cyan}${command}${colors.reset}\n\n" - Map params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) - Integer max_chars = paramsMaxChars(params_map) + 1 - Integer desc_indent = max_chars + 14 - Integer dec_linewidth = 160 - desc_indent - for (group in params_map.keySet()) { - Integer num_params = 0 - String group_output = colors.underlined + colors.bold + group + colors.reset + '\n' - def group_params = params_map.get(group) // This gets the parameters of that particular group - for (param in group_params.keySet()) { - if (group_params.get(param).hidden && !params.show_hidden_params) { - num_hidden += 1 - continue; - } - def type = '[' + group_params.get(param).type + ']' - def description = group_params.get(param).description - def defaultValue = group_params.get(param).default != null ? " [default: " + group_params.get(param).default.toString() + "]" : '' - def description_default = description + colors.dim + defaultValue + colors.reset - // Wrap long description texts - // Loosely based on https://dzone.com/articles/groovy-plain-text-word-wrap - if (description_default.length() > dec_linewidth){ - List olines = [] - String oline = "" // " " * indent - description_default.split(" ").each() { wrd -> - if ((oline.size() + wrd.size()) <= dec_linewidth) { - oline += wrd + " " - } else { - olines += oline - oline = wrd + " " - } - } - olines += oline - description_default = olines.join("\n" + " " * desc_indent) - } - group_output += " --" + param.padRight(max_chars) + colors.dim + type.padRight(10) + colors.reset + description_default + '\n' - num_params += 1 - } - group_output += '\n' - if (num_params > 0){ - output += group_output - } - } - if (num_hidden > 0){ - output += colors.dim + "!! Hiding $num_hidden params, use --show_hidden_params to show them !!\n" + colors.reset - } - output += NfcoreTemplate.dashedLine(params.monochrome_logs) - return output - } - - // - // Groovy Map summarising parameters/workflow options used by the pipeline - // - public static LinkedHashMap paramsSummaryMap(workflow, params, schema_filename='nextflow_schema.json') { - // Get a selection of core Nextflow workflow options - def Map workflow_summary = [:] - if (workflow.revision) { - workflow_summary['revision'] = workflow.revision - } - workflow_summary['runName'] = workflow.runName - if (workflow.containerEngine) { - workflow_summary['containerEngine'] = workflow.containerEngine - } - if (workflow.container) { - workflow_summary['container'] = workflow.container - } - workflow_summary['launchDir'] = workflow.launchDir - workflow_summary['workDir'] = workflow.workDir - workflow_summary['projectDir'] = workflow.projectDir - workflow_summary['userName'] = workflow.userName - workflow_summary['profile'] = workflow.profile - workflow_summary['configFiles'] = workflow.configFiles.join(', ') - - // Get pipeline parameters defined in JSON Schema - def Map params_summary = [:] - def params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) - for (group in params_map.keySet()) { - def sub_params = new LinkedHashMap() - def group_params = params_map.get(group) // This gets the parameters of that particular group - for (param in group_params.keySet()) { - if (params.containsKey(param)) { - def params_value = params.get(param) - def schema_value = group_params.get(param).default - def param_type = group_params.get(param).type - if (schema_value != null) { - if (param_type == 'string') { - if (schema_value.contains('$projectDir') || schema_value.contains('${projectDir}')) { - def sub_string = schema_value.replace('\$projectDir', '') - sub_string = sub_string.replace('\${projectDir}', '') - if (params_value.contains(sub_string)) { - schema_value = params_value - } - } - if (schema_value.contains('$params.outdir') || schema_value.contains('${params.outdir}')) { - def sub_string = schema_value.replace('\$params.outdir', '') - sub_string = sub_string.replace('\${params.outdir}', '') - if ("${params.outdir}${sub_string}" == params_value) { - schema_value = params_value - } - } - } - } - - // We have a default in the schema, and this isn't it - if (schema_value != null && params_value != schema_value) { - sub_params.put(param, params_value) - } - // No default in the schema, and this isn't empty - else if (schema_value == null && params_value != "" && params_value != null && params_value != false) { - sub_params.put(param, params_value) - } - } - } - params_summary.put(group, sub_params) - } - return [ 'Core Nextflow options' : workflow_summary ] << params_summary - } - - // - // Beautify parameters for summary and return as string - // - public static String paramsSummaryLog(workflow, params) { - Map colors = NfcoreTemplate.logColours(params.monochrome_logs) - String output = '' - def params_map = paramsSummaryMap(workflow, params) - def max_chars = paramsMaxChars(params_map) - for (group in params_map.keySet()) { - def group_params = params_map.get(group) // This gets the parameters of that particular group - if (group_params) { - output += colors.bold + group + colors.reset + '\n' - for (param in group_params.keySet()) { - output += " " + colors.blue + param.padRight(max_chars) + ": " + colors.green + group_params.get(param) + colors.reset + '\n' - } - output += '\n' - } - } - output += "!! Only displaying parameters that differ from the pipeline defaults !!\n" - output += NfcoreTemplate.dashedLine(params.monochrome_logs) - return output - } - - // - // Loop over nested exceptions and print the causingException - // - private static void printExceptions(ex_json, params_json, log, enums, limit=5) { - def causingExceptions = ex_json['causingExceptions'] - if (causingExceptions.length() == 0) { - def m = ex_json['message'] =~ /required key \[([^\]]+)\] not found/ - // Missing required param - if (m.matches()) { - log.error "* Missing required parameter: --${m[0][1]}" - } - // Other base-level error - else if (ex_json['pointerToViolation'] == '#') { - log.error "* ${ex_json['message']}" - } - // Error with specific param - else { - def param = ex_json['pointerToViolation'] - ~/^#\// - def param_val = params_json[param].toString() - if (enums.containsKey(param)) { - def error_msg = "* --${param}: '${param_val}' is not a valid choice (Available choices" - if (enums[param].size() > limit) { - log.error "${error_msg} (${limit} of ${enums[param].size()}): ${enums[param][0..limit-1].join(', ')}, ... )" - } else { - log.error "${error_msg}: ${enums[param].join(', ')})" - } - } else { - log.error "* --${param}: ${ex_json['message']} (${param_val})" - } - } - } - for (ex in causingExceptions) { - printExceptions(ex, params_json, log, enums) - } - } - - // - // Remove an element from a JSONArray - // - private static JSONArray removeElement(json_array, element) { - def list = [] - int len = json_array.length() - for (int i=0;i - if(raw_schema.keySet().contains('definitions')){ - raw_schema.definitions.each { definition -> - for (key in definition.keySet()){ - if (definition[key].get("properties").keySet().contains(ignore_param)){ - // Remove the param to ignore - definition[key].get("properties").remove(ignore_param) - // If the param was required, change this - if (definition[key].has("required")) { - def cleaned_required = removeElement(definition[key].required, ignore_param) - definition[key].put("required", cleaned_required) - } - } - } - } - } - if(raw_schema.keySet().contains('properties') && raw_schema.get('properties').keySet().contains(ignore_param)) { - raw_schema.get("properties").remove(ignore_param) - } - if(raw_schema.keySet().contains('required') && raw_schema.required.contains(ignore_param)) { - def cleaned_required = removeElement(raw_schema.required, ignore_param) - raw_schema.put("required", cleaned_required) - } - } - return raw_schema - } - - // - // Clean and check parameters relative to Nextflow native classes - // - private static Map cleanParameters(params) { - def new_params = params.getClass().newInstance(params) - for (p in params) { - // remove anything evaluating to false - if (!p['value']) { - new_params.remove(p.key) - } - // Cast MemoryUnit to String - if (p['value'].getClass() == nextflow.util.MemoryUnit) { - new_params.replace(p.key, p['value'].toString()) - } - // Cast Duration to String - if (p['value'].getClass() == nextflow.util.Duration) { - new_params.replace(p.key, p['value'].toString().replaceFirst(/d(?!\S)/, "day")) - } - // Cast LinkedHashMap to String - if (p['value'].getClass() == LinkedHashMap) { - new_params.replace(p.key, p['value'].toString()) - } - } - return new_params - } - - // - // This function tries to read a JSON params file - // - private static LinkedHashMap paramsLoad(String json_schema) { - def params_map = new LinkedHashMap() - try { - params_map = paramsRead(json_schema) - } catch (Exception e) { - println "Could not read parameters settings from JSON. $e" - params_map = new LinkedHashMap() - } - return params_map - } - - // - // Method to actually read in JSON file using Groovy. - // Group (as Key), values are all parameters - // - Parameter1 as Key, Description as Value - // - Parameter2 as Key, Description as Value - // .... - // Group - // - - private static LinkedHashMap paramsRead(String json_schema) throws Exception { - def json = new File(json_schema).text - def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions') - def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties') - /* Tree looks like this in nf-core schema - * definitions <- this is what the first get('definitions') gets us - group 1 - title - description - properties - parameter 1 - type - description - parameter 2 - type - description - group 2 - title - description - properties - parameter 1 - type - description - * properties <- parameters can also be ungrouped, outside of definitions - parameter 1 - type - description - */ - - // Grouped params - def params_map = new LinkedHashMap() - schema_definitions.each { key, val -> - def Map group = schema_definitions."$key".properties // Gets the property object of the group - def title = schema_definitions."$key".title - def sub_params = new LinkedHashMap() - group.each { innerkey, value -> - sub_params.put(innerkey, value) - } - params_map.put(title, sub_params) - } - - // Ungrouped params - def ungrouped_params = new LinkedHashMap() - schema_properties.each { innerkey, value -> - ungrouped_params.put(innerkey, value) - } - params_map.put("Other parameters", ungrouped_params) - - return params_map - } - - // - // Get maximum number of characters across all parameter names - // - private static Integer paramsMaxChars(params_map) { - Integer max_chars = 0 - for (group in params_map.keySet()) { - def group_params = params_map.get(group) // This gets the parameters of that particular group - for (param in group_params.keySet()) { - if (param.size() > max_chars) { - max_chars = param.size() - } - } - } - return max_chars - } -} diff --git a/nf_core/pipeline-template/lib/WorkflowMain.groovy b/nf_core/pipeline-template/lib/WorkflowMain.groovy index 4cb7409fb9..c87e77abc6 100755 --- a/nf_core/pipeline-template/lib/WorkflowMain.groovy +++ b/nf_core/pipeline-template/lib/WorkflowMain.groovy @@ -20,45 +20,10 @@ class WorkflowMain { " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" } - // - // Generate help string - // - public static String help(workflow, params) { - {% if igenomes -%} - def command = "nextflow run ${workflow.manifest.name} --input samplesheet.csv --genome GRCh37 -profile docker" - {% else -%} - def command = "nextflow run ${workflow.manifest.name} --input samplesheet.csv --fasta reference.fa -profile docker" - {% endif -%} - def help_string = '' - help_string += NfcoreTemplate.logo(workflow, params.monochrome_logs) - help_string += NfcoreSchema.paramsHelp(workflow, params, command) - help_string += '\n' + citation(workflow) + '\n' - help_string += NfcoreTemplate.dashedLine(params.monochrome_logs) - return help_string - } - - // - // Generate parameter summary log string - // - public static String paramsSummaryLog(workflow, params) { - def summary_log = '' - summary_log += NfcoreTemplate.logo(workflow, params.monochrome_logs) - summary_log += NfcoreSchema.paramsSummaryLog(workflow, params) - summary_log += '\n' + citation(workflow) + '\n' - summary_log += NfcoreTemplate.dashedLine(params.monochrome_logs) - return summary_log - } - // // Validate parameters and print summary to screen // public static void initialise(workflow, params, log) { - // Print help to screen if required - if (params.help) { - log.info help(workflow, params) - System.exit(0) - } - // Print workflow version and exit on --version if (params.version) { String workflow_version = NfcoreTemplate.version(workflow) @@ -66,14 +31,6 @@ class WorkflowMain { System.exit(0) } - // Print parameter summary log to screen - log.info paramsSummaryLog(workflow, params) - - // Validate workflow parameters via the JSON schema - if (params.validate_params) { - NfcoreSchema.validateParameters(workflow, params, log) - } - // Check that a -profile or Nextflow config has been provided to run the pipeline NfcoreTemplate.checkConfigProvided(workflow, log) diff --git a/nf_core/pipeline-template/modules/local/samplesheet_check.nf b/nf_core/pipeline-template/modules/local/samplesheet_check.nf deleted file mode 100644 index 77be6dfff4..0000000000 --- a/nf_core/pipeline-template/modules/local/samplesheet_check.nf +++ /dev/null @@ -1,31 +0,0 @@ -process SAMPLESHEET_CHECK { - tag "$samplesheet" - label 'process_single' - - conda "conda-forge::python=3.8.3" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/python:3.8.3' : - 'biocontainers/python:3.8.3' }" - - input: - path samplesheet - - output: - path '*.csv' , emit: csv - path "versions.yml", emit: versions - - when: - task.ext.when == null || task.ext.when - - script: // This script is bundled with the pipeline, in {{ name }}/bin/ - """ - check_samplesheet.py \\ - $samplesheet \\ - samplesheet.valid.csv - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - python: \$(python --version | sed 's/Python //g') - END_VERSIONS - """ -} diff --git a/nf_core/pipeline-template/nextflow.config b/nf_core/pipeline-template/nextflow.config index 2650572c93..73ed72425f 100644 --- a/nf_core/pipeline-template/nextflow.config +++ b/nf_core/pipeline-template/nextflow.config @@ -38,9 +38,7 @@ params { hook_url = null help = false version = false - validate_params = true - show_hidden_params = false - schema_ignore_params = 'genomes' + {% if nf_core_configs %} // Config options @@ -58,6 +56,13 @@ params { max_cpus = 16 max_time = '240.h' + // Schema validation default options + validationFailUnrecognisedParams = false + validationLenientMode = false + validationSchemaIgnoreParams = 'genomes' + validationShowHiddenParams = false + validate_params = true + } // Load base.config by default for all pipelines @@ -179,6 +184,11 @@ profiles { docker.registry = 'quay.io' podman.registry = 'quay.io' +// Nextflow plugins +plugins { + id 'nf-validation@0.1.0' // Validation of pipeline parameters and creation of an input channel from a sample sheet +} + {% if igenomes %} // Load igenomes.config if required if (!params.igenomes_ignore) { diff --git a/nf_core/pipeline-template/nextflow_schema.json b/nf_core/pipeline-template/nextflow_schema.json index 2743562d6c..d3ad943a5d 100644 --- a/nf_core/pipeline-template/nextflow_schema.json +++ b/nf_core/pipeline-template/nextflow_schema.json @@ -257,12 +257,26 @@ "fa_icon": "fas fa-check-square", "hidden": true }, - "show_hidden_params": { + "validationShowHiddenParams": { "type": "boolean", "fa_icon": "far fa-eye-slash", "description": "Show all params when using `--help`", "hidden": true, "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." + }, + "validationFailUnrecognisedParams": { + "type": "boolean", + "fa_icon": "far fa-check-circle", + "description": "Validation of parameters fails when an unrecognised parameter is found.", + "hidden": true, + "help_text": "By default, when an unrecognised parameter is found, it returns a warinig." + }, + "validationLenientMode": { + "type": "boolean", + "fa_icon": "far fa-check-circle", + "description": "Validation of parameters in lenient more.", + "hidden": true, + "help_text": "Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode)." } } } diff --git a/nf_core/pipeline-template/pyproject.toml b/nf_core/pipeline-template/pyproject.toml index 0d62beb6f9..bc01239b3e 100644 --- a/nf_core/pipeline-template/pyproject.toml +++ b/nf_core/pipeline-template/pyproject.toml @@ -1,4 +1,4 @@ -# Config file for Python. Mostly used to configure linting of bin/check_samplesheet.py with Black. +# Config file for Python. Mostly used to configure linting of bin/*.py with Black. # Should be kept the same as nf-core/tools to avoid fighting with template synchronisation. [tool.black] line-length = 120 diff --git a/nf_core/pipeline-template/subworkflows/local/input_check.nf b/nf_core/pipeline-template/subworkflows/local/input_check.nf deleted file mode 100644 index 0aecf87fb7..0000000000 --- a/nf_core/pipeline-template/subworkflows/local/input_check.nf +++ /dev/null @@ -1,44 +0,0 @@ -// -// Check input samplesheet and get read channels -// - -include { SAMPLESHEET_CHECK } from '../../modules/local/samplesheet_check' - -workflow INPUT_CHECK { - take: - samplesheet // file: /path/to/samplesheet.csv - - main: - SAMPLESHEET_CHECK ( samplesheet ) - .csv - .splitCsv ( header:true, sep:',' ) - .map { create_fastq_channel(it) } - .set { reads } - - emit: - reads // channel: [ val(meta), [ reads ] ] - versions = SAMPLESHEET_CHECK.out.versions // channel: [ versions.yml ] -} - -// Function to get list of [ meta, [ fastq_1, fastq_2 ] ] -def create_fastq_channel(LinkedHashMap row) { - // create meta map - def meta = [:] - meta.id = row.sample - meta.single_end = row.single_end.toBoolean() - - // add path(s) of the fastq file(s) to the meta map - def fastq_meta = [] - if (!file(row.fastq_1).exists()) { - exit 1, "ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!\n${row.fastq_1}" - } - if (meta.single_end) { - fastq_meta = [ meta, [ file(row.fastq_1) ] ] - } else { - if (!file(row.fastq_2).exists()) { - exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.fastq_2}" - } - fastq_meta = [ meta, [ file(row.fastq_1), file(row.fastq_2) ] ] - } - return fastq_meta -} diff --git a/nf_core/pipeline-template/workflows/pipeline.nf b/nf_core/pipeline-template/workflows/pipeline.nf index 9bcc0086b5..d55c375c2b 100644 --- a/nf_core/pipeline-template/workflows/pipeline.nf +++ b/nf_core/pipeline-template/workflows/pipeline.nf @@ -4,9 +4,28 @@ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) +include { validateParameters; paramsHelp; paramsSummaryLog; paramsSummaryMap; fromSamplesheet } from 'plugin/nf-validation' + +def logo = NfcoreTemplate.logo(workflow, params.monochrome_logs) +def citation = '\n' + WorkflowMain.citation(workflow) + '\n' +def summary_params = paramsSummaryMap(workflow) + +// Print help message if needed +if (params.help) { + def String command = "nextflow run ${workflow.manifest.name} --input samplesheet.csv --genome GRCh37 -profile docker" + log.info logo + paramsHelp(command) + citation + NfcoreTemplate.dashedLine(params.monochrome_logs) + System.exit(0) +} // Validate input parameters +if (params.validate_params) { + validateParameters() +} + + +// Print parameter summary log to screen +log.info logo + paramsSummaryLog(workflow) + citation + Workflow{{ short_name[0]|upper }}{{ short_name[1:] }}.initialise(params, log) // TODO nf-core: Add all file path parameters for the pipeline to the list below @@ -14,9 +33,6 @@ Workflow{{ short_name[0]|upper }}{{ short_name[1:] }}.initialise(params, log) def checkPathParamList = [ params.input, params.multiqc_config, params.fasta ] for (param in checkPathParamList) { if (param) { file(param, checkIfExists: true) } } -// Check mandatory parameters -if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet not specified!' } - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CONFIG FILES @@ -37,7 +53,6 @@ ch_multiqc_custom_methods_description = params.multiqc_methods_description ? fil // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules // -include { INPUT_CHECK } from '../subworkflows/local/input_check' /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -66,18 +81,16 @@ workflow {{ short_name|upper }} { ch_versions = Channel.empty() // - // SUBWORKFLOW: Read in samplesheet, validate and stage input files + // Create input channel from input file provided through params.input // - INPUT_CHECK ( - ch_input - ) - ch_versions = ch_versions.mix(INPUT_CHECK.out.versions) + // TODO: For more information on how to format your samplesheet and assets/schema_input.json, please refer to nf-validation plugin documentation https://nextflow-io.github.io/nf-validation/ + ch_input = Channel.fromSamplesheet("input") // // MODULE: Run FastQC // FASTQC ( - INPUT_CHECK.out.reads + ch_input ) ch_versions = ch_versions.mix(FASTQC.out.versions.first()) diff --git a/nf_core/schema.py b/nf_core/schema.py index ba88e762ea..75dbebce04 100644 --- a/nf_core/schema.py +++ b/nf_core/schema.py @@ -245,8 +245,8 @@ def validate_default_params(self): self.get_wf_params() # Collect parameters to ignore - if "schema_ignore_params" in self.pipeline_params: - params_ignore = self.pipeline_params.get("schema_ignore_params", "").strip("\"'").split(",") + if "validationSchemaIgnoreParams" in self.pipeline_params: + params_ignore = self.pipeline_params.get("validationSchemaIgnoreParams", "").strip("\"'").split(",") else: params_ignore = [] @@ -759,8 +759,8 @@ def add_schema_found_configs(self): Add anything that's found in the Nextflow params that's missing in the pipeline schema """ params_added = [] - params_ignore = self.pipeline_params.get("schema_ignore_params", "").strip("\"'").split(",") - params_ignore.append("schema_ignore_params") + params_ignore = self.pipeline_params.get("validationSchemaIgnoreParams", "").strip("\"'").split(",") + params_ignore.append("validationSchemaIgnoreParams") for p_key, p_val in self.pipeline_params.items(): # Check if key is in schema parameters if p_key not in self.schema_params and p_key not in params_ignore: diff --git a/nf_core/synced_repo.py b/nf_core/synced_repo.py new file mode 100644 index 0000000000..f78142c031 --- /dev/null +++ b/nf_core/synced_repo.py @@ -0,0 +1,418 @@ +import filecmp +import logging +import os +import shutil +from pathlib import Path + +import git +import rich +import rich.progress +from git.exc import GitCommandError + +from nf_core.utils import load_tools_config + +log = logging.getLogger(__name__) + +# Constants for the nf-core/modules repo used throughout the module files +NF_CORE_MODULES_NAME = "nf-core" +NF_CORE_MODULES_REMOTE = "https://github.com/nf-core/modules.git" +NF_CORE_MODULES_DEFAULT_BRANCH = "master" + + +class RemoteProgressbar(git.RemoteProgress): + """ + An object to create a progressbar for when doing an operation with the remote. + Note that an initialized rich Progress (progress bar) object must be passed + during initialization. + """ + + def __init__(self, progress_bar, repo_name, remote_url, operation): + """ + Initializes the object and adds a task to the progressbar passed as 'progress_bar' + + Args: + progress_bar (rich.progress.Progress): A rich progress bar object + repo_name (str): Name of the repository the operation is performed on + remote_url (str): Git URL of the repository the operation is performed on + operation (str): The operation performed on the repository, i.e. 'Pulling', 'Cloning' etc. + """ + super().__init__() + self.progress_bar = progress_bar + self.tid = self.progress_bar.add_task( + f"{operation} from [bold green]'{repo_name}'[/bold green] ([link={remote_url}]{remote_url}[/link])", + start=False, + state="Waiting for response", + ) + + def update(self, op_code, cur_count, max_count=None, message=""): + """ + Overrides git.RemoteProgress.update. + Called every time there is a change in the remote operation + """ + if not self.progress_bar.tasks[self.tid].started: + self.progress_bar.start_task(self.tid) + self.progress_bar.update( + self.tid, total=max_count, completed=cur_count, state=f"{cur_count / max_count * 100:.1f}%" + ) + + +class SyncedRepo: + """ + An object to store details about a locally cached code repository. + """ + + local_repo_statuses = {} + no_pull_global = False + + @staticmethod + def local_repo_synced(repo_name): + """ + Checks whether a local repo has been cloned/pull in the current session + """ + return SyncedRepo.local_repo_statuses.get(repo_name, False) + + @staticmethod + def update_local_repo_status(repo_name, up_to_date): + """ + Updates the clone/pull status of a local repo + """ + SyncedRepo.local_repo_statuses[repo_name] = up_to_date + + @staticmethod + def get_remote_branches(remote_url): + """ + Get all branches from a remote repository + + Args: + remote_url (str): The git url to the remote repository + + Returns: + (set[str]): All branches found in the remote + """ + try: + unparsed_branches = git.Git().ls_remote(remote_url) + except git.GitCommandError: + raise LookupError(f"Was unable to fetch branches from '{remote_url}'") + else: + branches = {} + for branch_info in unparsed_branches.split("\n"): + sha, name = branch_info.split("\t") + if name != "HEAD": + # The remote branches are shown as 'ref/head/branch' + branch_name = Path(name).stem + branches[sha] = branch_name + return set(branches.values()) + + def __init__(self, remote_url=None, branch=None, no_pull=False, hide_progress=False): + """ + Initializes the object and clones the git repository if it is not already present + """ + + # This allows us to set this one time and then keep track of the user's choice + SyncedRepo.no_pull_global |= no_pull + + # Check if the remote seems to be well formed + if remote_url is None: + remote_url = NF_CORE_MODULES_REMOTE + + self.remote_url = remote_url + + self.fullname = nf_core.modules.modules_utils.repo_full_name_from_remote(self.remote_url) + + self.setup_local_repo(remote_url, branch, hide_progress) + + config_fn, repo_config = load_tools_config(self.local_repo_dir) + try: + self.repo_path = repo_config["org_path"] + except KeyError: + raise UserWarning(f"'org_path' key not present in {config_fn.name}") + + # Verify that the repo seems to be correctly configured + if self.repo_path != NF_CORE_MODULES_NAME or self.branch: + self.verify_branch() + + # Convenience variable + self.modules_dir = os.path.join(self.local_repo_dir, "modules", self.repo_path) + self.subworkflows_dir = os.path.join(self.local_repo_dir, "subworkflows", self.repo_path) + + self.avail_module_names = None + + def verify_sha(self, prompt, sha): + """ + Verify that 'sha' and 'prompt' arguments are not provided together. + Verify that the provided SHA exists in the repo. + + Arguments: + prompt (bool): prompt asking for SHA + sha (str): provided sha + """ + if prompt and sha is not None: + log.error("Cannot use '--sha' and '--prompt' at the same time!") + return False + + if sha: + if not self.sha_exists_on_branch(sha): + log.error(f"Commit SHA '{sha}' doesn't exist in '{self.remote_url}'") + return False + + return True + + def setup_branch(self, branch): + """ + Verify that we have a branch and otherwise use the default one. + The branch is then checked out to verify that it exists in the repo. + + Args: + branch (str): Name of branch + """ + if branch is None: + # Don't bother fetching default branch if we're using nf-core + if self.remote_url == NF_CORE_MODULES_REMOTE: + self.branch = "master" + else: + self.branch = self.get_default_branch() + else: + self.branch = branch + + # Verify that the branch exists by checking it out + self.branch_exists() + + def get_default_branch(self): + """ + Gets the default branch for the repo (the branch origin/HEAD is pointing to) + """ + origin_head = next(ref for ref in self.repo.refs if ref.name == "origin/HEAD") + _, branch = origin_head.ref.name.split("/") + return branch + + def branch_exists(self): + """ + Verifies that the branch exists in the repository by trying to check it out + """ + try: + self.checkout_branch() + except GitCommandError: + raise LookupError(f"Branch '{self.branch}' not found in '{self.remote_url}'") + + def verify_branch(self): + """ + Verifies the active branch conforms to the correct directory structure + """ + dir_names = os.listdir(self.local_repo_dir) + if "modules" not in dir_names: + err_str = f"Repository '{self.remote_url}' ({self.branch}) does not contain the 'modules/' directory" + if "software" in dir_names: + err_str += ( + ".\nAs of nf-core/tools version 2.0, the 'software/' directory should be renamed to 'modules/'" + ) + raise LookupError(err_str) + + def checkout_branch(self): + """ + Checks out the specified branch of the repository + """ + self.repo.git.checkout(self.branch) + + def checkout(self, commit): + """ + Checks out the repository at the requested commit + + Args: + commit (str): Git SHA of the commit + """ + self.repo.git.checkout(commit) + + def component_exists(self, component_name, component_type, checkout=True, commit=None): + """ + Check if a module/subworkflow exists in the branch of the repo + + Args: + component_name (str): The name of the module/subworkflow + + Returns: + (bool): Whether the module/subworkflow exists in this branch of the repository + """ + return component_name in self.get_avail_components(component_type, checkout=checkout, commit=commit) + + def get_component_dir(self, component_name, component_type): + """ + Returns the file path of a module/subworkflow directory in the repo. + Does not verify that the path exists. + Args: + component_name (str): The name of the module/subworkflow + + Returns: + component_path (str): The path of the module/subworkflow in the local copy of the repository + """ + if component_type == "modules": + return os.path.join(self.modules_dir, component_name) + elif component_type == "subworkflows": + return os.path.join(self.subworkflows_dir, component_name) + + def install_component(self, component_name, install_dir, commit, component_type): + """ + Install the module/subworkflow files into a pipeline at the given commit + + Args: + component_name (str): The name of the module/subworkflow + install_dir (str): The path where the module/subworkflow should be installed + commit (str): The git SHA for the version of the module/subworkflow to be installed + + Returns: + (bool): Whether the operation was successful or not + """ + # Check out the repository at the requested ref + try: + self.checkout(commit) + except git.GitCommandError: + return False + + # Check if the module/subworkflow exists in the branch + if not self.component_exists(component_name, component_type, checkout=False): + log.error( + f"The requested {component_type[:-1]} does not exists in the branch '{self.branch}' of {self.remote_url}'" + ) + return False + + # Copy the files from the repo to the install folder + shutil.copytree(self.get_component_dir(component_name, component_type), Path(install_dir, component_name)) + + # Switch back to the tip of the branch + self.checkout_branch() + return True + + def module_files_identical(self, module_name, base_path, commit): + """ + Checks whether the module files in a pipeline are identical to the ones in the remote + Args: + module_name (str): The name of the module + base_path (str): The path to the module in the pipeline + + Returns: + (bool): Whether the pipeline files are identical to the repo files + """ + if commit is None: + self.checkout_branch() + else: + self.checkout(commit) + module_files = ["main.nf", "meta.yml"] + files_identical = {file: True for file in module_files} + module_dir = self.get_component_dir(module_name, "modules") + for file in module_files: + try: + files_identical[file] = filecmp.cmp(os.path.join(module_dir, file), os.path.join(base_path, file)) + except FileNotFoundError: + log.debug(f"Could not open file: {os.path.join(module_dir, file)}") + continue + self.checkout_branch() + return files_identical + + def get_component_git_log(self, component_name, component_type, depth=None): + """ + Fetches the commit history the of requested module/subworkflow since a given date. The default value is + not arbitrary - it is the last time the structure of the nf-core/modules repository was had an + update breaking backwards compatibility. + Args: + component_name (str): Name of module/subworkflow + modules_repo (SyncedRepo): A SyncedRepo object configured for the repository in question + + Returns: + ( dict ): Iterator of commit SHAs and associated (truncated) message + """ + self.checkout_branch() + component_path = os.path.join(component_type, self.repo_path, component_name) + commits_new = self.repo.iter_commits(max_count=depth, paths=component_path) + commits_new = [ + {"git_sha": commit.hexsha, "trunc_message": commit.message.partition("\n")[0]} for commit in commits_new + ] + commits_old = [] + if component_type == "modules": + # Grab commits also from previous modules structure + component_path = os.path.join("modules", component_name) + commits_old = self.repo.iter_commits(max_count=depth, paths=component_path) + commits_old = [ + {"git_sha": commit.hexsha, "trunc_message": commit.message.partition("\n")[0]} for commit in commits_old + ] + commits = iter(commits_new + commits_old) + return commits + + def get_latest_component_version(self, component_name, component_type): + """ + Returns the latest commit in the repository + """ + return list(self.get_component_git_log(component_name, component_type, depth=1))[0]["git_sha"] + + def sha_exists_on_branch(self, sha): + """ + Verifies that a given commit sha exists on the branch + """ + self.checkout_branch() + return sha in (commit.hexsha for commit in self.repo.iter_commits()) + + def get_commit_info(self, sha): + """ + Fetches metadata about the commit (dates, message, etc.) + Args: + commit_sha (str): The SHA of the requested commit + Returns: + message (str): The commit message for the requested commit + date (str): The commit date for the requested commit + Raises: + LookupError: If the search for the commit fails + """ + self.checkout_branch() + for commit in self.repo.iter_commits(): + if commit.hexsha == sha: + message = commit.message.partition("\n")[0] + date_obj = commit.committed_datetime + date = str(date_obj.date()) + return message, date + raise LookupError(f"Commit '{sha}' not found in the '{self.remote_url}'") + + def get_avail_components(self, component_type, checkout=True, commit=None): + """ + Gets the names of the modules/subworkflows in the repository. They are detected by + checking which directories have a 'main.nf' file + + Returns: + ([ str ]): The module/subworkflow names + """ + if checkout: + self.checkout_branch() + if commit is not None: + self.checkout(commit) + # Get directory + if component_type == "modules": + directory = self.modules_dir + elif component_type == "subworkflows": + directory = self.subworkflows_dir + # Module/Subworkflow directories are characterized by having a 'main.nf' file + avail_component_names = [ + os.path.relpath(dirpath, start=directory) + for dirpath, _, file_names in os.walk(directory) + if "main.nf" in file_names + ] + return avail_component_names + + def get_meta_yml(self, component_type, module_name): + """ + Returns the contents of the 'meta.yml' file of a module + + Args: + module_name (str): The name of the module + + Returns: + (str): The contents of the file in text format + """ + self.checkout_branch() + if component_type == "modules": + path = Path(self.modules_dir, module_name, "meta.yml") + elif component_type == "subworkflows": + path = Path(self.subworkflows_dir, module_name, "meta.yml") + else: + raise ValueError(f"Invalid component type: {component_type}") + if not path.exists(): + return None + with open(path) as fh: + contents = fh.read() + return contents diff --git a/nf_core/utils.py b/nf_core/utils.py index 3cd09397e3..3ddce9b870 100644 --- a/nf_core/utils.py +++ b/nf_core/utils.py @@ -836,34 +836,48 @@ def prompt_remote_pipeline_name(wfs): raise AssertionError(f"Not able to find pipeline '{pipeline}'") -def prompt_pipeline_release_branch(wf_releases, wf_branches): +def prompt_pipeline_release_branch(wf_releases, wf_branches, multiple=False): """Prompt for pipeline release / branch Args: wf_releases (array): Array of repo releases as returned by the GitHub API wf_branches (array): Array of repo branches, as returned by the GitHub API + multiple (bool): Allow selection of multiple releases & branches (for Tower) Returns: choice (str): Selected release / branch name """ - # Prompt user for release tag + # Prompt user for release tag, tag_set will contain all available. choices = [] + tag_set = [] # Releases if len(wf_releases) > 0: for tag in map(lambda release: release.get("tag_name"), wf_releases): tag_display = [("fg:ansiblue", f"{tag} "), ("class:choice-default", "[release]")] choices.append(questionary.Choice(title=tag_display, value=tag)) + tag_set.append(tag) # Branches for branch in wf_branches.keys(): branch_display = [("fg:ansiyellow", f"{branch} "), ("class:choice-default", "[branch]")] choices.append(questionary.Choice(title=branch_display, value=branch)) + tag_set.append(branch) if len(choices) == 0: return False - return questionary.select("Select release / branch:", choices=choices, style=nfcore_question_style).unsafe_ask() + if multiple: + return ( + questionary.checkbox("Select release / branch:", choices=choices, style=nfcore_question_style).unsafe_ask(), + tag_set, + ) + + else: + return ( + questionary.select("Select release / branch:", choices=choices, style=nfcore_question_style).unsafe_ask(), + tag_set, + ) def get_repo_releases_branches(pipeline, wfs): diff --git a/tests/data/testdata_remote_containers.txt b/tests/data/testdata_remote_containers.txt new file mode 100644 index 0000000000..93cf46f2f6 --- /dev/null +++ b/tests/data/testdata_remote_containers.txt @@ -0,0 +1,37 @@ +./depot.galaxyproject.org-singularity-bbmap-38.93--he522d1c_0.img +./depot.galaxyproject.org-singularity-bedtools-2.30.0--hc088bd4_0.img +./depot.galaxyproject.org-singularity-bioconductor-dupradar-1.18.0--r40_1.img +./depot.galaxyproject.org-singularity-bioconductor-summarizedexperiment-1.20.0--r40_0.img +./depot.galaxyproject.org-singularity-bioconductor-tximeta-1.8.0--r40_0.img +./depot.galaxyproject.org-singularity-fastqc-0.11.9--0.img +./depot.galaxyproject.org-singularity-gffread-0.12.1--h8b12597_0.img +./depot.galaxyproject.org-singularity-hisat2-2.2.1--h1b792b2_3.img +./depot.galaxyproject.org-singularity-mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2-59cdd445419f14abac76b31dd0d71217994cbcc9-0.img +./depot.galaxyproject.org-singularity-mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2-afaaa4c6f5b308b4b6aa2dd8e99e1466b2a6b0cd-0.img +./depot.galaxyproject.org-singularity-mulled-v2-8849acf39a43cdd6c839a369a74c0adc823e2f91-ab110436faf952a33575c64dd74615a84011450b-0.img +./depot.galaxyproject.org-singularity-mulled-v2-a97e90b3b802d1da3d6958e0867610c718cb5eb1-0e773bb207600fcb4d38202226eb20a33c7909b6-0.img +./depot.galaxyproject.org-singularity-mulled-v2-a97e90b3b802d1da3d6958e0867610c718cb5eb1-38aed4501da19db366dc7c8d52d31d94e760cfaf-0.img +./depot.galaxyproject.org-singularity-mulled-v2-cf0123ef83b3c38c13e3b0696a3f285d3f20f15b-64aad4a4e144878400649e71f42105311be7ed87-0.img +./depot.galaxyproject.org-singularity-multiqc-1.11--pyhdfd78af_0.img +./depot.galaxyproject.org-singularity-multiqc-1.13--pyhdfd78af_0.img +./depot.galaxyproject.org-singularity-perl-5.26.2.img +./depot.galaxyproject.org-singularity-picard-2.26.10--hdfd78af_0.img +./depot.galaxyproject.org-singularity-picard-2.27.4--hdfd78af_0.img +./depot.galaxyproject.org-singularity-preseq-3.1.2--h445547b_2.img +./depot.galaxyproject.org-singularity-python-3.9--1.img +./depot.galaxyproject.org-singularity-qualimap-2.2.2d--1.img +./depot.galaxyproject.org-singularity-rseqc-3.0.1--py37h516909a_1.img +./depot.galaxyproject.org-singularity-salmon-1.5.2--h84f40af_0.img +./depot.galaxyproject.org-singularity-samtools-1.15.1--h1170115_0.img +./depot.galaxyproject.org-singularity-sortmerna-4.3.4--h9ee0642_0.img +./depot.galaxyproject.org-singularity-stringtie-2.2.1--hecb563c_2.img +./depot.galaxyproject.org-singularity-subread-2.0.1--hed695b0_0.img +./depot.galaxyproject.org-singularity-trim-galore-0.6.7--hdfd78af_0.img +./depot.galaxyproject.org-singularity-ubuntu-20.04.img +./depot.galaxyproject.org-singularity-ucsc-bedclip-377--h0b8a92a_2.img +./depot.galaxyproject.org-singularity-ucsc-bedgraphtobigwig-377--h446ed27_1.img +./depot.galaxyproject.org-singularity-umi_tools-1.1.2--py38h4a8c8d9_0.img +These entries should not be used: +On October 5, 2011, the 224-meter containership MV Rena struck a reef close to New Zealand’s coast and broke apart. That spells disaster, no? +MV Rena + diff --git a/tests/test_cli.py b/tests/test_cli.py index 0a6b37144d..873b7d4b0c 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -165,8 +165,10 @@ def test_cli_download(self, mock_dl): "outdir": "/path/outdir", "compress": "tar.gz", "force": None, + "tower": None, "container": "singularity", - "singularity-cache-only": None, + "singularity-cache": "copy", + "singularity-cache-index": "/path/index.txt", "parallel-downloads": 2, } @@ -177,12 +179,14 @@ def test_cli_download(self, mock_dl): mock_dl.assert_called_once_with( cmd[-1], - params["revision"], + (params["revision"],), params["outdir"], params["compress"], "force" in params, + "tower" in params, params["container"], - "singularity-cache-only" in params, + params["singularity-cache"], + params["singularity-cache-index"], params["parallel-downloads"], ) diff --git a/tests/test_download.py b/tests/test_download.py index e2ae882394..aa2e959f3d 100644 --- a/tests/test_download.py +++ b/tests/test_download.py @@ -3,16 +3,20 @@ import hashlib import os +import re import shutil import tempfile import unittest +from pathlib import Path from unittest import mock import pytest import nf_core.create import nf_core.utils -from nf_core.download import DownloadWorkflow +from nf_core.download import DownloadWorkflow, WorkflowRepo +from nf_core.synced_repo import SyncedRepo +from nf_core.utils import NFCORE_CACHE_DIR, NFCORE_DIR from .utils import with_temporary_file, with_temporary_folder @@ -32,10 +36,10 @@ def test_get_release_hash_release(self): download_obj.wf_branches, ) = nf_core.utils.get_repo_releases_branches(pipeline, wfs) download_obj.get_revision_hash() - assert download_obj.wf_sha == "b3e5e3b95aaf01d98391a62a10a3990c0a4de395" - assert download_obj.outdir == "nf-core-methylseq-1.6" + assert download_obj.wf_sha[download_obj.revision[0]] == "b3e5e3b95aaf01d98391a62a10a3990c0a4de395" + assert download_obj.outdir == "nf-core-methylseq_1.6" assert ( - download_obj.wf_download_url + download_obj.wf_download_url[download_obj.revision[0]] == "https://github.com/nf-core/methylseq/archive/b3e5e3b95aaf01d98391a62a10a3990c0a4de395.zip" ) @@ -51,10 +55,10 @@ def test_get_release_hash_branch(self): download_obj.wf_branches, ) = nf_core.utils.get_repo_releases_branches(pipeline, wfs) download_obj.get_revision_hash() - assert download_obj.wf_sha == "819cbac792b76cf66c840b567ed0ee9a2f620db7" - assert download_obj.outdir == "nf-core-exoseq-dev" + assert download_obj.wf_sha[download_obj.revision[0]] == "819cbac792b76cf66c840b567ed0ee9a2f620db7" + assert download_obj.outdir == "nf-core-exoseq_dev" assert ( - download_obj.wf_download_url + download_obj.wf_download_url[download_obj.revision[0]] == "https://github.com/nf-core/exoseq/archive/819cbac792b76cf66c840b567ed0ee9a2f620db7.zip" ) @@ -78,12 +82,16 @@ def test_get_release_hash_non_existent_release(self): def test_download_wf_files(self, outdir): download_obj = DownloadWorkflow(pipeline="nf-core/methylseq", revision="1.6") download_obj.outdir = outdir - download_obj.wf_sha = "b3e5e3b95aaf01d98391a62a10a3990c0a4de395" - download_obj.wf_download_url = ( - "https://github.com/nf-core/methylseq/archive/b3e5e3b95aaf01d98391a62a10a3990c0a4de395.zip" + download_obj.wf_sha = {"1.6": "b3e5e3b95aaf01d98391a62a10a3990c0a4de395"} + download_obj.wf_download_url = { + "1.6": "https://github.com/nf-core/methylseq/archive/b3e5e3b95aaf01d98391a62a10a3990c0a4de395.zip" + } + rev = download_obj.download_wf_files( + download_obj.revision[0], + download_obj.wf_sha[download_obj.revision[0]], + download_obj.wf_download_url[download_obj.revision[0]], ) - download_obj.download_wf_files() - assert os.path.exists(os.path.join(outdir, "workflow", "main.nf")) + assert os.path.exists(os.path.join(outdir, rev, "main.nf")) # # Tests for 'download_configs' @@ -118,7 +126,7 @@ def test_wf_use_local_configs(self, tmp_path): download_obj.download_configs() # Test the function - download_obj.wf_use_local_configs() + download_obj.wf_use_local_configs("workflow") wf_config = nf_core.utils.fetch_wf_config(os.path.join(test_outdir, "workflow"), cache_config=False) assert wf_config["params.custom_config_base"] == f"'{test_outdir}/workflow/../configs/'" @@ -133,14 +141,14 @@ def test_find_container_images(self, tmp_path, mock_fetch_wf_config): "process.mapping.container": "cutting-edge-container", "process.nocontainer": "not-so-cutting-edge", } - download_obj.find_container_images() + download_obj.find_container_images("workflow") assert len(download_obj.containers) == 1 assert download_obj.containers[0] == "cutting-edge-container" # # Tests for 'singularity_pull_image' # - # If Singularity is installed, but the container can't be accessed because it does not exist or there are aceess + # If Singularity is installed, but the container can't be accessed because it does not exist or there are access # restrictions, a FileNotFoundError is raised due to the unavailability of the image. @pytest.mark.skipif( shutil.which("singularity") is None, @@ -153,18 +161,44 @@ def test_singularity_pull_image_singularity_installed(self, tmp_dir, mock_rich_p with pytest.raises(FileNotFoundError): download_obj.singularity_pull_image("a-container", tmp_dir, None, mock_rich_progress) - # If Singularity is not installed, it raises a FileNotFoundError because the singularity command can't be found. + # If Singularity is not installed, it raises a OSError because the singularity command can't be found. @pytest.mark.skipif( shutil.which("singularity") is not None, - reason="Can't test how the code behaves when sungularity is not installed if it is.", + reason="Can't test how the code behaves when singularity is not installed if it is.", ) @with_temporary_folder @mock.patch("rich.progress.Progress.add_task") def test_singularity_pull_image_singularity_not_installed(self, tmp_dir, mock_rich_progress): download_obj = DownloadWorkflow(pipeline="dummy", outdir=tmp_dir) - with pytest.raises(FileNotFoundError): + with pytest.raises(OSError): download_obj.singularity_pull_image("a-container", tmp_dir, None, mock_rich_progress) + # + # Test for '--singularity-cache remote --singularity-cache-index'. Provide a list of containers already available in a remote location. + # + @with_temporary_folder + def test_remote_container_functionality(self, tmp_dir): + os.environ["NXF_SINGULARITY_CACHEDIR"] = "foo" + + download_obj = DownloadWorkflow( + pipeline="nf-core/rnaseq", + outdir=os.path.join(tmp_dir, "new"), + revision="3.9", + compress_type="none", + singularity_cache_index=Path(__file__).resolve().parent / "data/testdata_remote_containers.txt", + ) + + download_obj.include_configs = False # suppress prompt, because stderr.is_interactive doesn't. + + # test if the settings are changed to mandatory defaults, if an external cache index is used. + assert download_obj.singularity_cache == "remote" and download_obj.container == "singularity" + assert isinstance(download_obj.containers_remote, list) and len(download_obj.containers_remote) == 0 + # read in the file + download_obj.read_remote_containers() + assert len(download_obj.containers_remote) == 33 + assert "depot.galaxyproject.org-singularity-salmon-1.5.2--h84f40af_0.img" in download_obj.containers_remote + assert "MV Rena" not in download_obj.containers_remote # decoy in test file + # # Tests for the main entry method 'download_workflow' # @@ -180,6 +214,67 @@ def test_download_workflow_with_success(self, tmp_dir, mock_download_image, mock container="singularity", revision="1.6", compress_type="none", + singularity_cache="copy", ) + download_obj.include_configs = True # suppress prompt, because stderr.is_interactive doesn't. download_obj.download_workflow() + + # + # Test Download for Tower + # + @with_temporary_folder + def test_download_workflow_for_tower(self, tmp_dir): + download_obj = DownloadWorkflow( + pipeline="nf-core/rnaseq", + revision=("3.7", "3.9"), + compress_type="none", + tower=True, + ) + + download_obj.include_configs = False # suppress prompt, because stderr.is_interactive doesn't. + + assert isinstance(download_obj.revision, list) and len(download_obj.revision) == 2 + assert isinstance(download_obj.wf_sha, dict) and len(download_obj.wf_sha) == 0 + assert isinstance(download_obj.wf_download_url, dict) and len(download_obj.wf_download_url) == 0 + + wfs = nf_core.list.Workflows() + wfs.get_remote_workflows() + ( + download_obj.pipeline, + download_obj.wf_revisions, + download_obj.wf_branches, + ) = nf_core.utils.get_repo_releases_branches(download_obj.pipeline, wfs) + + download_obj.get_revision_hash() + + # download_obj.wf_download_url is not set for tower downloads, but the sha values are + assert isinstance(download_obj.wf_sha, dict) and len(download_obj.wf_sha) == 2 + assert isinstance(download_obj.wf_download_url, dict) and len(download_obj.wf_download_url) == 0 + + # The outdir for multiple revisions is the pipeline name and date: e.g. nf-core-rnaseq_2023-04-27_18-54 + assert bool(re.search(r"nf-core-rnaseq_\d{4}-\d{2}-\d{1,2}_\d{1,2}-\d{1,2}", download_obj.outdir, re.S)) + + download_obj.output_filename = f"{download_obj.outdir}.git" + download_obj.download_workflow_tower(location=tmp_dir) + + assert download_obj.workflow_repo + assert isinstance(download_obj.workflow_repo, WorkflowRepo) + assert issubclass(type(download_obj.workflow_repo), SyncedRepo) + # corroborate that the other revisions are inaccessible to the user. + assert len(download_obj.workflow_repo.tags) == len(download_obj.revision) + + # download_obj.download_workflow_tower(location=tmp_dir) will run container image detection for all requested revisions + assert isinstance(download_obj.containers, list) and len(download_obj.containers) == 33 + # manually test container image detection for 3.7 revision only + download_obj.containers = [] # empty container list for the test + download_obj.workflow_repo.checkout(download_obj.wf_sha["3.7"]) + download_obj.find_container_images(download_obj.workflow_repo.access()) + assert len(download_obj.containers) == 30 # 30 containers for 3.7 + assert ( + "https://depot.galaxyproject.org/singularity/bbmap:38.93--he522d1c_0" in download_obj.containers + ) # direct definition + assert ( + "https://depot.galaxyproject.org/singularity/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:59cdd445419f14abac76b31dd0d71217994cbcc9-0" + in download_obj.containers + ) # indirect definition via $container variable.