nf-core · drpatelh · Jun 18, 2021 · May 13, 2021 · May 13, 2021 · May 13, 2021
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -48,7 +48,7 @@ jobs:
       matrix:
         parameters:
           - "--skip_qc"
-          - "--remove_ribo_rna"
+          - "--remove_ribo_rna --skip_qualimap"
           - "--skip_trimming"
           - "--gtf false"
           - "--star_index false"
@@ -148,23 +148,3 @@ jobs:
       - name: Run pipeline with Salmon and various parameters
         run: |
           nextflow run ${GITHUB_WORKSPACE} -profile test,docker --pseudo_aligner salmon ${{ matrix.parameters }}
-
-  sra_download:
-    name: Test downloading of public data
-    if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/rnaseq') }}
-    runs-on: ubuntu-latest
-    env:
-      NXF_VER: ${{ matrix.nxf_ver }}
-      NXF_ANSI_LOG: false
-    steps:
-      - name: Check out pipeline code
-        uses: actions/checkout@v2
-
-      - name: Install Nextflow
-        run: |
-          wget -qO- get.nextflow.io | bash
-          sudo mv nextflow /usr/local/bin/
-
-      - name: Run pipeline to download public data
-        run: |
-          nextflow run ${GITHUB_WORKSPACE} -profile test_sra,docker
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,29 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [[3.2](https://github.com/nf-core/rnaseq/releases/tag/3.2)] - 2021-06-17
+
+### Enhancements & fixes
+
+* Removed workflow to download data from public databases in favour of using [nf-core/fetchngs](https://nf-co.re/fetchngs)
+* Added a stand-alone Python script [`bin/fastq_dir_to_samplesheet.py`](https://github.com/nf-core/rnaseq/blob/master/bin/fastq_dir_to_samplesheet.py) to auto-create samplesheet from a directory of FastQ files
+* Added docs about overwriting default container definitions to use latest versions e.g. Pangolin
+* [[#645](https://github.com/nf-core/rnaseq/issues/645)] - Remove trailing slash from `params.igenomes_base`
+* [[#649](https://github.com/nf-core/rnaseq/issues/649)] - DESeq2 fails with only one sample
+* [[#652](https://github.com/nf-core/rnaseq/issues/652)] - Results files have incorrect file names
+* [[nf-core/viralrecon#201](https://github.com/nf-core/viralrecon/issues/201)] - Conditional include are not expected to work
+
+### Parameters
+
+| Old parameter               | New parameter                  |
+|-----------------------------|--------------------------------|
+| `--public_data_ids`         |                                |
+| `--skip_sra_fastq_download` |                                |
+
+> **NB:** Parameter has been __updated__ if both old and new parameter information is present.
+> **NB:** Parameter has been __added__ if just the new parameter information is present.
+> **NB:** Parameter has been __removed__ if parameter information isn't present.
+
 ## [[3.1](https://github.com/nf-core/rnaseq/releases/tag/3.1)] - 2021-05-13
 
 ### :warning: Major enhancements

diff --git a/README.md b/README.md
@@ -24,32 +24,33 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
 
 ## Pipeline summary
 
-1. Download FastQ files via SRA, ENA or GEO ids and auto-create input samplesheet ([`ENA FTP`](https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html); *if required*)
-2. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
-3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
-4. UMI extraction ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
-5. Adapter and quality trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
-6. Removal of ribosomal RNA ([`SortMeRNA`](https://github.com/biocore/sortmerna))
-7. Choice of multiple alignment and quantification routes:
+The SRA download functionality has been removed from the pipeline (`>=3.2`) and ported to an independent workflow called [nf-core/fetchngs](https://nf-co.re/fetchngs). You can provide `--nf_core_pipeline rnaseq` when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly as input by this pipeline.
+
+1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
+2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
+3. UMI extraction ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
+4. Adapter and quality trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
+5. Removal of ribosomal RNA ([`SortMeRNA`](https://github.com/biocore/sortmerna))
+6. Choice of multiple alignment and quantification routes:
     1. [`STAR`](https://github.com/alexdobin/STAR) -> [`Salmon`](https://combine-lab.github.io/salmon/)
     2. [`STAR`](https://github.com/alexdobin/STAR) -> [`RSEM`](https://github.com/deweylab/RSEM)
     3. [`HiSAT2`](https://ccb.jhu.edu/software/hisat2/index.shtml) -> **NO QUANTIFICATION**
-8. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
-9. UMI-based deduplication ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
-10. Duplicate read marking ([`picard MarkDuplicates`](https://broadinstitute.github.io/picard/))
-11. Transcript assembly and quantification ([`StringTie`](https://ccb.jhu.edu/software/stringtie/))
-12. Create bigWig coverage files ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
-13. Extensive quality control:
+7. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
+8. UMI-based deduplication ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
+9. Duplicate read marking ([`picard MarkDuplicates`](https://broadinstitute.github.io/picard/))
+10. Transcript assembly and quantification ([`StringTie`](https://ccb.jhu.edu/software/stringtie/))
+11. Create bigWig coverage files ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
+12. Extensive quality control:
     1. [`RSeQC`](http://rseqc.sourceforge.net/)
     2. [`Qualimap`](http://qualimap.bioinfo.cipf.es/)
     3. [`dupRadar`](https://bioconductor.org/packages/release/bioc/html/dupRadar.html)
     4. [`Preseq`](http://smithlabresearch.org/software/preseq/)
     5. [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
-14. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/); *optional*)
-15. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
+13. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/); *optional*)
+14. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
 
-> **NB:** Quantification isn't performed if using `--aligner hisat2` due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments. However, you can use this route if you have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2.
-> **NB:** The `--aligner star_rsem` option will require STAR indices built from version 2.7.6a or later. However, in order to support legacy usage of genomes hosted on AWS iGenomes the `--aligner star_salmon` option requires indices built with STAR 2.6.1d or earlier. Please refer to this [issue](https://github.com/nf-core/rnaseq/issues/498) for further details.
+> * **NB:** Quantification isn't performed if using `--aligner hisat2` due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments. However, you can use this route if you have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2.
+> * **NB:** The `--aligner star_rsem` option will require STAR indices built from version 2.7.6a or later. However, in order to support legacy usage of genomes hosted on AWS iGenomes the `--aligner star_salmon` option requires indices built with STAR 2.6.1d or earlier. Please refer to this [issue](https://github.com/nf-core/rnaseq/issues/498) for further details.
 
 ## Quick Start
 
@@ -59,7 +60,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
 
 3. Download the pipeline and test it on a minimal dataset with a single command:
 
-    ```bash
+    ```console
     nextflow run nf-core/rnaseq -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
     ```
 
@@ -69,30 +70,22 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
 
 4. Start running your own analysis!
 
-    * Typical command for RNA-seq analysis:
-
-        ```bash
-        nextflow run nf-core/rnaseq \
-            --input samplesheet.csv \
-            --genome GRCh37 \
-            -profile <docker/singularity/podman/conda/institute>
-        ```
+    ```console
+    nextflow run nf-core/rnaseq \
+        --input samplesheet.csv \
+        --genome GRCh37 \
+        -profile <docker/singularity/podman/conda/institute>
+    ```
 
-    * Typical command for downloading public data:
+    * An executable Python script called [`fastq_dir_to_samplesheet.py`](https://github.com/nf-core/rnaseq/blob/master/bin/fastq_dir_to_samplesheet.py) has been provided if you would like to auto-create an input samplesheet based on a directory containing FastQ files **before** you run the pipeline (requires Python 3 installed locally) e.g.
 
-        ```bash
-        nextflow run nf-core/rnaseq \
-            --public_data_ids ids.txt \
-            -profile <docker/singularity/podman/conda/institute>
+        ```console
+        ~/.nextflow/assets/nf-core/rnaseq/bin/fastq_dir_to_samplesheet.py <FASTQ_DIR> samplesheet.csv
         ```
 
-    > **NB:** The commands to obtain public data and to run the main arm of the pipeline are completely independent. This is intentional because it allows you to download all of the raw data in an initial pipeline run (`results/public_data/`) and then to curate the auto-created samplesheet based on the available sample metadata before you run the pipeline again properly.
-
-See [usage](https://nf-co.re/rnaseq/usage) and [parameter](https://nf-co.re/rnaseq/parameters) docs for all of the available options when running the pipeline.
-
 ## Documentation
 
-The nf-core/rnaseq pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/rnaseq/usage) and [output](https://nf-co.re/rnaseq/output).
+The nf-core/rnaseq pipeline comes with documentation about the pipeline [usage](https://nf-co.re/rnaseq/usage), [parameters](https://nf-co.re/rnaseq/parameters) and [output](https://nf-co.re/rnaseq/output).
 
 ## Credits
 

diff --git a/assets/schema_public_data_ids.json b/assets/schema_public_data_ids.json
diff --git a/bin/check_samplesheet.py b/bin/check_samplesheet.py
@@ -67,10 +67,8 @@ def check_samplesheet(file_in, file_out):
 
             ## Check sample name entries
             sample, fastq_1, fastq_2, strandedness = lspl[:len(HEADER)]
-            if sample:
-                if sample.find(" ") != -1:
-                    print_error("Sample entry contains spaces!", "Line", line)
-            else:
+            sample = sample.replace(' ', '_')
+            if not sample:
                 print_error("Sample entry has not been specified!", "Line", line)
 
             ## Check FastQ file extension