Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev -> Master for v3.2 release #657

Merged
merged 66 commits into from
Jun 18, 2021
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
df3cc40
Bump pipeline version to 3.2dev
drpatelh May 13, 2021
79a64cb
Fix markdownlint
drpatelh May 13, 2021
5f554bf
Merge pull request #638 from drpatelh/attributes
drpatelh May 13, 2021
91d7fb3
Fix pipeline if only one sample is provided
c-mertes May 20, 2021
196c64d
Add missing iGenome - Rnor_5.0
adomingues May 21, 2021
107d741
Merge pull request #642 from adomingues/rn5
drpatelh May 24, 2021
fd19871
Patch use of file append
pditommaso May 26, 2021
f4c8ec9
typo
maxulysse May 26, 2021
47a9757
Merge pull request #644 from nf-core/maxulysse-patch-1
drpatelh May 26, 2021
079a579
Update sra_to_samplesheet.nf
drpatelh Jun 1, 2021
221938b
Merge pull request #643 from nf-core/sra_to_samplesheet
drpatelh Jun 1, 2021
26543a3
Fix #645
drpatelh Jun 1, 2021
da7e6ad
Merge pull request #647 from drpatelh/attributes
drpatelh Jun 1, 2021
d1c865f
Additional fixes for PR #643
drpatelh Jun 1, 2021
9c9a1d4
Update CHANGELOG
drpatelh Jun 1, 2021
0009beb
Update script to accept user-provided set of metadata fields
drpatelh Jun 1, 2021
cba3975
Tweak spacing in script
drpatelh Jun 1, 2021
a134158
Add --ena_metadata_fields parameter
drpatelh Jun 1, 2021
ecf69b8
Update usage docs
drpatelh Jun 1, 2021
bd2ea6b
Add Groovy function to check --ena_metadata_fields
drpatelh Jun 1, 2021
b512488
Tweak Groovy code to be more generic
drpatelh Jun 1, 2021
9675cc4
Small fixes
drpatelh Jun 1, 2021
1f7aa00
Add ability to use new --ena_metadata_fields parameter
drpatelh Jun 1, 2021
a87c01f
Merge pull request #641 from c-mertes/patch-1
drpatelh Jun 1, 2021
b40de29
Merge branch 'dev' of https://github.com/nf-core/rnaseq into attributes
drpatelh Jun 1, 2021
eb32d11
Make linters happy
drpatelh Jun 1, 2021
b0ff698
Merge pull request #648 from drpatelh/attributes
drpatelh Jun 1, 2021
65b6c4f
Delete files no longer required from SRA download workflow
drpatelh Jun 14, 2021
106c023
Strip out SRA download functionality everywhere else
drpatelh Jun 14, 2021
cd6ea45
Final updates
drpatelh Jun 14, 2021
0e65d9e
Fix markdownlint
drpatelh Jun 14, 2021
8a03760
Fix docs rendering
drpatelh Jun 14, 2021
94fb253
Fix block quotes in main README
drpatelh Jun 14, 2021
1e3a465
skip qualimap to fix out of actions space error
drpatelh Jun 15, 2021
22b3866
Merge pull request #653 from drpatelh/attributes
drpatelh Jun 15, 2021
035c134
Save dds object before variance stabilisation fails if n=1
gavin-kelly-1 Jun 16, 2021
aa279f8
Remove caching R output to files
gavin-kelly-1 Jun 16, 2021
8a94ebf
Update deseq2_qc.r
drpatelh Jun 16, 2021
ad12a69
Update CHANGELOG
drpatelh Jun 16, 2021
887736c
Replace sense/antisense with forward/reverse everywhere
drpatelh Jun 16, 2021
cb0e58a
Merge pull request #655 from macroscian/dev
drpatelh Jun 16, 2021
229e8a8
Merge branch 'dev' of https://github.com/nf-core/rnaseq into issues
drpatelh Jun 16, 2021
e1f2223
Replace SENSE with FORWARD in process name
drpatelh Jun 16, 2021
8d1a01a
Merge pull request #656 from drpatelh/issues
drpatelh Jun 16, 2021
fb916c7
Add script to auto-generate samplesheet from directory
drpatelh Jun 17, 2021
7e4e40f
Update CHANGELOG
drpatelh Jun 17, 2021
3fe6f96
Update CHANGELOG
drpatelh Jun 17, 2021
0fc2ee7
Bump version to 3.2
drpatelh Jun 17, 2021
3a91e13
Merge pull request #658 from drpatelh/issues
drpatelh Jun 17, 2021
31cb933
Add `salmon_quant_libtype` parameter
JoseEspinosa Jun 17, 2021
63ddab1
Update usage
JoseEspinosa Jun 17, 2021
97ecc5b
Update nextflow.config
drpatelh Jun 18, 2021
162a135
Update nextflow_schema.json
drpatelh Jun 18, 2021
2de6dff
Apply suggestions from code review
drpatelh Jun 18, 2021
e36234a
Update CHANGELOG.md
drpatelh Jun 18, 2021
1d46809
Merge pull request #659 from JoseEspinosa/salmon_q_lib
drpatelh Jun 18, 2021
57f3a47
Remove remnant scripts from sra download workflow
drpatelh Jun 18, 2021
76dfe0c
Add wget command to download fastq_dir_to_samplesheet.py
drpatelh Jun 18, 2021
3eb2bac
Add warning about spaces being replaced by underscores
drpatelh Jun 18, 2021
17a15fe
Actually merge across multiple lanes
drpatelh Jun 18, 2021
936d73a
Black
drpatelh Jun 18, 2021
8e66601
Allow pattern in make_samplesheet
grst Jun 18, 2021
38ee463
Merge pull request #660 from drpatelh/issues
drpatelh Jun 18, 2021
b1fcd35
Merge remote-tracking branch 'upstream/dev' into make_samplesheet
grst Jun 18, 2021
7164915
Revert to extension. Sort glob.
grst Jun 18, 2021
05e0763
Merge pull request #661 from grst/make_samplesheet
drpatelh Jun 18, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 1 addition & 21 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
matrix:
parameters:
- "--skip_qc"
- "--remove_ribo_rna"
- "--remove_ribo_rna --skip_qualimap"
- "--skip_trimming"
- "--gtf false"
- "--star_index false"
Expand Down Expand Up @@ -148,23 +148,3 @@ jobs:
- name: Run pipeline with Salmon and various parameters
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --pseudo_aligner salmon ${{ matrix.parameters }}

sra_download:
name: Test downloading of public data
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/rnaseq') }}
runs-on: ubuntu-latest
env:
NXF_VER: ${{ matrix.nxf_ver }}
NXF_ANSI_LOG: false
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline to download public data
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_sra,docker
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,29 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[3.2](https://github.com/nf-core/rnaseq/releases/tag/3.2)] - 2021-06-17

### Enhancements & fixes

* Removed workflow to download data from public databases in favour of using [nf-core/fetchngs](https://nf-co.re/fetchngs)
* Added a stand-alone Python script [`bin/fastq_dir_to_samplesheet.py`](https://github.com/nf-core/rnaseq/blob/master/bin/fastq_dir_to_samplesheet.py) to auto-create samplesheet from a directory of FastQ files
* Added docs about overwriting default container definitions to use latest versions e.g. Pangolin
* [[#645](https://github.com/nf-core/rnaseq/issues/645)] - Remove trailing slash from `params.igenomes_base`
* [[#649](https://github.com/nf-core/rnaseq/issues/649)] - DESeq2 fails with only one sample
* [[#652](https://github.com/nf-core/rnaseq/issues/652)] - Results files have incorrect file names
* [[nf-core/viralrecon#201](https://github.com/nf-core/viralrecon/issues/201)] - Conditional include are not expected to work

### Parameters

| Old parameter | New parameter |
|-----------------------------|--------------------------------|
| `--public_data_ids` | |
| `--skip_sra_fastq_download` | |

> **NB:** Parameter has been __updated__ if both old and new parameter information is present.
> **NB:** Parameter has been __added__ if just the new parameter information is present.
> **NB:** Parameter has been __removed__ if parameter information isn't present.

## [[3.1](https://github.com/nf-core/rnaseq/releases/tag/3.1)] - 2021-05-13

### :warning: Major enhancements
Expand Down
65 changes: 29 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,32 +24,33 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool

## Pipeline summary

1. Download FastQ files via SRA, ENA or GEO ids and auto-create input samplesheet ([`ENA FTP`](https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html); *if required*)
2. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
4. UMI extraction ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
5. Adapter and quality trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
6. Removal of ribosomal RNA ([`SortMeRNA`](https://github.com/biocore/sortmerna))
7. Choice of multiple alignment and quantification routes:
The SRA download functionality has been removed from the pipeline (`>=3.2`) and ported to an independent workflow called [nf-core/fetchngs](https://nf-co.re/fetchngs). You can provide `--nf_core_pipeline rnaseq` when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly as input by this pipeline.

1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
3. UMI extraction ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
4. Adapter and quality trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
5. Removal of ribosomal RNA ([`SortMeRNA`](https://github.com/biocore/sortmerna))
6. Choice of multiple alignment and quantification routes:
1. [`STAR`](https://github.com/alexdobin/STAR) -> [`Salmon`](https://combine-lab.github.io/salmon/)
2. [`STAR`](https://github.com/alexdobin/STAR) -> [`RSEM`](https://github.com/deweylab/RSEM)
3. [`HiSAT2`](https://ccb.jhu.edu/software/hisat2/index.shtml) -> **NO QUANTIFICATION**
8. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
9. UMI-based deduplication ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
10. Duplicate read marking ([`picard MarkDuplicates`](https://broadinstitute.github.io/picard/))
11. Transcript assembly and quantification ([`StringTie`](https://ccb.jhu.edu/software/stringtie/))
12. Create bigWig coverage files ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
13. Extensive quality control:
7. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
8. UMI-based deduplication ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
9. Duplicate read marking ([`picard MarkDuplicates`](https://broadinstitute.github.io/picard/))
10. Transcript assembly and quantification ([`StringTie`](https://ccb.jhu.edu/software/stringtie/))
11. Create bigWig coverage files ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
12. Extensive quality control:
1. [`RSeQC`](http://rseqc.sourceforge.net/)
2. [`Qualimap`](http://qualimap.bioinfo.cipf.es/)
3. [`dupRadar`](https://bioconductor.org/packages/release/bioc/html/dupRadar.html)
4. [`Preseq`](http://smithlabresearch.org/software/preseq/)
5. [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
14. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/); *optional*)
15. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
13. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/); *optional*)
14. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))

> **NB:** Quantification isn't performed if using `--aligner hisat2` due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments. However, you can use this route if you have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2.
> **NB:** The `--aligner star_rsem` option will require STAR indices built from version 2.7.6a or later. However, in order to support legacy usage of genomes hosted on AWS iGenomes the `--aligner star_salmon` option requires indices built with STAR 2.6.1d or earlier. Please refer to this [issue](https://github.com/nf-core/rnaseq/issues/498) for further details.
> * **NB:** Quantification isn't performed if using `--aligner hisat2` due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments. However, you can use this route if you have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2.
> * **NB:** The `--aligner star_rsem` option will require STAR indices built from version 2.7.6a or later. However, in order to support legacy usage of genomes hosted on AWS iGenomes the `--aligner star_salmon` option requires indices built with STAR 2.6.1d or earlier. Please refer to this [issue](https://github.com/nf-core/rnaseq/issues/498) for further details.

## Quick Start

Expand All @@ -59,7 +60,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool

3. Download the pipeline and test it on a minimal dataset with a single command:

```bash
```console
nextflow run nf-core/rnaseq -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
```

Expand All @@ -69,30 +70,22 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool

4. Start running your own analysis!

* Typical command for RNA-seq analysis:

```bash
nextflow run nf-core/rnaseq \
--input samplesheet.csv \
--genome GRCh37 \
-profile <docker/singularity/podman/conda/institute>
```
```console
nextflow run nf-core/rnaseq \
--input samplesheet.csv \
--genome GRCh37 \
-profile <docker/singularity/podman/conda/institute>
```

* Typical command for downloading public data:
* An executable Python script called [`fastq_dir_to_samplesheet.py`](https://github.com/nf-core/rnaseq/blob/master/bin/fastq_dir_to_samplesheet.py) has been provided if you would like to auto-create an input samplesheet based on a directory containing FastQ files **before** you run the pipeline (requires Python 3 installed locally) e.g.

```bash
nextflow run nf-core/rnaseq \
--public_data_ids ids.txt \
-profile <docker/singularity/podman/conda/institute>
```console
~/.nextflow/assets/nf-core/rnaseq/bin/fastq_dir_to_samplesheet.py <FASTQ_DIR> samplesheet.csv
```

> **NB:** The commands to obtain public data and to run the main arm of the pipeline are completely independent. This is intentional because it allows you to download all of the raw data in an initial pipeline run (`results/public_data/`) and then to curate the auto-created samplesheet based on the available sample metadata before you run the pipeline again properly.

See [usage](https://nf-co.re/rnaseq/usage) and [parameter](https://nf-co.re/rnaseq/parameters) docs for all of the available options when running the pipeline.

## Documentation

The nf-core/rnaseq pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/rnaseq/usage) and [output](https://nf-co.re/rnaseq/output).
The nf-core/rnaseq pipeline comes with documentation about the pipeline [usage](https://nf-co.re/rnaseq/usage), [parameters](https://nf-co.re/rnaseq/parameters) and [output](https://nf-co.re/rnaseq/output).

## Credits

Expand Down
15 changes: 0 additions & 15 deletions assets/schema_public_data_ids.json

This file was deleted.

6 changes: 2 additions & 4 deletions bin/check_samplesheet.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,8 @@ def check_samplesheet(file_in, file_out):

## Check sample name entries
sample, fastq_1, fastq_2, strandedness = lspl[:len(HEADER)]
if sample:
if sample.find(" ") != -1:
print_error("Sample entry contains spaces!", "Line", line)
else:
sample = sample.replace(' ', '_')
if not sample:
print_error("Sample entry has not been specified!", "Line", line)

## Check FastQ file extension
Expand Down
Loading