Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subsample mt #508

Merged
merged 18 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Add FOUND_IN tag, which mentions the variant caller that found the mutation, in the INFO column of the vcf files [#471](https://github.com/nf-core/raredisease/pull/471)
- A new parameter `vep_plugin_files` to supply files required by vep plugins [#482](https://github.com/nf-core/raredisease/pull/482)
- New workflow for annotating mobile elements [#483](https://github.com/nf-core/raredisease/pull/483)
- Added a functionality to subsample mitochondrial alignment, and a new parameter `skip_mt_subsample` to skip the subworkflow [#508](https://github.com/nf-core/raredisease/pull/508).
- Chromograph to plot coverage across chromosomes [#507](https://github.com/nf-core/raredisease/pull/507)

### `Changed`
Expand Down
42 changes: 42 additions & 0 deletions conf/modules/subsample_mt.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = Conditional clause
----------------------------------------------------------------------------------------
*/

//
// Subsample MT
//

process {
withName: '.*SUBSAMPLE_MT:BEDTOOLS_GENOMECOV' {
ext.args = { "-dz" }
ext.prefix = { "${meta.id}" }
}

withName: '.*SUBSAMPLE_MT:SAMTOOLS_VIEW' {
ext.args = { "--output-fmt BAM -h -F 4 -s ${meta.seedfrac}" }
ext.prefix = { "${meta.id}_mt_subsample" }
publishDir = [
path: { "${params.outdir}/alignment" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*SUBSAMPLE_MT:SAMTOOLS_INDEX' {
publishDir = [
path: { "${params.outdir}/alignment" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

}
13 changes: 13 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Duplicate marking](#duplicate-marking)
- [Picard's MarkDuplicates](#picards-markduplicates)
- [Sentieon Dedup](#sentieon-dedup)
- [Subsample mitochondrial alignments](#subsample-mitochondrial-alignments)
- [Quality control and reporting](#quality-control-and-reporting)
- [Quality control](#quality-control)
- [FastQC](#fastqc)
Expand Down Expand Up @@ -116,6 +117,18 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- `*.metrics`: Text file containing the dedup metrics.
</details>

#### Subsample mitochondrial alignments

[Samtools view](https://www.htslib.org/doc/samtools-view.html) is used by the pipeline to subsample mitochondrial alignments to a user specified coverage. The file is mainly intended to be used for visualization of MT alignments in IGV. The non-subsampled bam file is used for variant calling and other downstream analysis steps.

<details markdown="1">
<summary>Output files from Alignment</summary>

- `{outputdir}/alignment/`
- `<sampleid>_mt_subsample.bam`: Alignment file in bam format.
- `<sampleid>_mt_subsample.bam.bai`: Index of the corresponding bam file.
</details>

### Quality control and reporting

#### Quality control
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@
"git_sha": "44096c08ffdbc694f5f92ae174ea0f7ba0f37e09",
"installed_by": ["modules"]
},
"bedtools/genomecov": {
"branch": "master",
"git_sha": "575e1bc54b083fb15e7dd8b5fcc40bea60e8ce83",
"installed_by": ["modules"]
},
"bwa/index": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
Expand Down
47 changes: 47 additions & 0 deletions modules/local/calculate_seed_fraction.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
process CALCULATE_SEED_FRACTION {
tag "$meta.id"
label 'process_low'

conda "conda-forge::python=3.8.3"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/python:3.8.3' :
'biocontainers/python:3.8.3' }"

input:
tuple val(meta), path(cov)
val rd
val seed

output:
tuple val(meta), path("seedfrac.csv"), emit: csv
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
export MT_COVERAGE=`awk '{cov += \$3}END{ if (NR > 0) print cov / NR }' $cov`

python -c "import os;print('%0.6f' % ($seed+ $rd/float(os.environ['MT_COVERAGE'])))" >seedfrac.csv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
calculate_seed_fraction: v1.0
python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

stub:
"""
touch seedfrac.csv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
calculate_seed_fraction: v1.0
python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
}
7 changes: 7 additions & 0 deletions modules/nf-core/bedtools/genomecov/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

70 changes: 70 additions & 0 deletions modules/nf-core/bedtools/genomecov/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 59 additions & 0 deletions modules/nf-core/bedtools/genomecov/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

118 changes: 118 additions & 0 deletions modules/nf-core/bedtools/genomecov/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading