Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tbp-parser to TheiaProk #184

Closed
wants to merge 12 commits into from
Closed
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Bioinformatics workflows for characterization, epidemiology and sharing of patho

**More information about the steps undertaken in these workflows is available via the [Theiagen Public Resources Documentation](https://theiagen.notion.site/Theiagen-Public-Health-Resources-a4bd134b0c5c4fe39870e21029a30566).**

Support for running these workflows can be sought by raising a [GitHub issue](https://github.com/theiagen/public_health_bioinformatics/issues/new) or by contacting Theiagen at support@theiagen.com.
Support for running these workflows can be sought by raising a [GitHub issue](https://github.com/theiagen/public_health_bioinformatics/issues/new/choose) or by contacting Theiagen at support@theiagen.com.

These workflows are written in [WDL](https://github.com/openwdl/wdl), a language for specifying data processing workflows with a human-readable and writeable syntax. They have been developed by [Theiagen Genomics](https://theiagen.com/) to primarily run on the [Terra.bio](https://terra.bio/) platform but can be run locally or on an HPC system at the command-line with Cromwell or miniWDL.

Expand All @@ -13,7 +13,7 @@ These workflows are written in [WDL](https://github.com/openwdl/wdl), a language
* Workflows and task development influenced by The Broad's [Viral Pipes](https://github.com/broadinstitute/viral-pipelines)
* TheiaCoV workflows for viral genomic characterization influenced by UPHL's [Cecret](https://github.com/UPHL-BioNGS/Cecret) & StaPH-B's [Monroe](https://staph-b.github.io/staphb_toolkit/workflow_docs/monroe/)
* TheiaProk workflows for bacterial genomic characterization influenced by Robert Petit's [bactopia](https://github.com/bactopia/bactopia)
* The PHB workflow user community. To provide feedback, please raise a [GitHub issue](https://github.com/theiagen/public_health_vioinformatics/issues/new).
* The PHB workflow user community. To provide feedback, please raise a [GitHub issue](https://github.com/theiagen/public_health_bioinformatics/issues/new/choose).

### Contributing to the PHB workflows
Contributions to the workflows contained in this repository are warmly welcomed. Our style guide may be found [here](https://theiagen.notion.site/Style-Guide-WDL-Workflow-Development-bb456f34322d4f4db699d4029050481c) for convenience of formatting.
Expand Down
210 changes: 0 additions & 210 deletions tasks/species_typing/task_tb_gene_coverage.wdl

This file was deleted.

56 changes: 56 additions & 0 deletions tasks/species_typing/task_tbp_parser.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
version 1.0

task tbp_parser {
input {
File tbprofiler_json
File tbprofiler_bam
File tbprofiler_bai
String samplename

String? sequencing_method
String? operator
Int min_depth = 10
Int coverage_threshold = 100
Boolean tbp_parser_debug = false

String docker = "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:0.0.9"
Int disk_size = 100
Int memory = 4
Int cpu = 1
}
command <<<
# get version
python3 /tbp-parser/tbp_parser/tbp_parser.py --version | tee VERSION

# run tbp-parser
python3 /tbp-parser/tbp_parser/tbp_parser.py ~{tbprofiler_json} ~{tbprofiler_bam} \
~{"--sequencing_method " + sequencing_method} \
~{"--operator " + operator} \
~{"--min_depth " + min_depth} \
~{"--coverage_threshold " + coverage_threshold} \
--output_prefix ~{samplename} \
~{true="--debug" false="--verbose" tbp_parser_debug}

# get genome percent coverage for the entire reference genome length over min_depth
genome=$(samtools depth -J ~{tbprofiler_bam} | awk -F "\t" '{if ($3 >= ~{min_depth}) print;}' | wc -l )
python3 -c "print ( ($genome / 4411532 ) * 100 )" | tee GENOME_PC
>>>
output {
File tbp_parser_looker_report_csv = "~{samplename}.looker_report.csv"
File tbp_parser_laboratorian_report_csv = "~{samplename}.laboratorian_report.csv"
File tbp_parser_lims_report_csv = "~{samplename}.lims_report.csv"
File tbp_parser_coverage_report = "~{samplename}.percent_gene_coverage.csv"
Float tbp_parser_genome_percent_coverage = read_float("GENOME_PC")
String tbp_parser_version = read_string("VERSION")
String tbp_parser_docker = docker
}
runtime {
docker: docker
memory: memory + " GB"
cpu: cpu
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
preemptible: 1
}
}
15 changes: 12 additions & 3 deletions tasks/species_typing/task_tbprofiler.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ task tbprofiler {
String tbprofiler_docker_image = "us-docker.pkg.dev/general-theiagen/staphb/tbprofiler:4.4.2"
Int disk_size = 100
String mapper = "bwa"
String caller = "bcftools"
String caller = "freebayes"
Int min_depth = 10
Float min_af = 0.1
Float min_af_pred = 0.1
Expand All @@ -22,7 +22,7 @@ task tbprofiler {
date | tee DATE

# Print and save version
tb-profiler --version > VERSION && sed -i -e 's/^/TBProfiler version /' VERSION
tb-profiler version > VERSION && sed -i -e 's/TBProfiler version //' VERSION && sed -n -i '$p' VERSION

if [ -z "~{read2}" ] ; then
INPUT_READS="-1 ~{read1}"
Expand Down Expand Up @@ -89,6 +89,12 @@ task tbprofiler {
res_genes.append(tsv_dict[i])
res_genes_string=';'.join(res_genes)
Resistance_Genes.write(res_genes_string)
with open ("MEDIAN_COVERAGE", 'wt') as Median_Coverage:
median_coverage=tsv_dict['median_coverage']
Median_Coverage.write(median_coverage)
with open ("PCT_READS_MAPPED", 'wt') as Pct_Reads_Mapped:
pct_reads_mapped=tsv_dict['pct_reads_mapped']
Pct_Reads_Mapped.write(pct_reads_mapped)
CODE
>>>
output {
Expand All @@ -104,13 +110,16 @@ task tbprofiler {
String tbprofiler_num_dr_variants = read_string("NUM_DR_VARIANTS")
String tbprofiler_num_other_variants = read_string("NUM_OTHER_VARIANTS")
String tbprofiler_resistance_genes = read_string("RESISTANCE_GENES")
Int tbprofiler_median_coverage = read_int("MEDIAN_COVERAGE")
Float tbprofiler_pct_reads_mapped = read_float("PCT_READS_MAPPED")
}
runtime {
docker: "~{tbprofiler_docker_image}"
memory: "16 GB"
cpu: cpu
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
maxRetries: 3
maxRetries: 3
preemptible: 1
}
}
Loading