Adding tbp_parser and clockwork to TheiaProk #192

frankambrosio3 · 2023-09-19T21:12:26Z

Closes #183 and closes #96

🛠️ Changes Being Made

Added a task that uses the CDC Varpipe Docker image to run Clockwork for read decontamination.
Added this task to merlin magic in TheiaProk_Illumina_PE_PHB to be triggered by GAMBIT taxon ID of "Mycobacterium tuberculosis"
Added decontaminated reads as an optional output to TheiaProk_Illumina_PE_PHB

And:

created the tbp-parser repository that implements the expert rules for interpreting the outputs of TBProfiler and added that to a new task
changed the default variant caller in TBProfiler from bcftools to freebayes.
added the changes Ash O'Farrell introduced in TBProfiler: Fix minor bug; output median cov, % unmapped #176

🧠 Context and Rationale

Mycobacteria are highly clonal and terribly difficult to distinguish using broadly generalizable bioinformatics techniques. Clockwork, specifically the CDC's Varpipe implementation of Clockwork, does a great job removing reads that map to ref genomes of similar species better than to the H37rv Mtb ref.

and

This is an adjustment & improvement to the very large single-task Python code that was exceeding 1000 lines. Unit tests have been implemented, functionality has been tested, many bugs squashed.

📋 Workflow/Task Steps

Map reads to all decon refs and H37rv
Remove reads that have mapping score to non-H37rv genomes higher than to H37rv

and

Removes the following tasks:

tasks/species_typing/task_tb_gene_coverage.wdl
tasks/species_typing/task_tbprofiler_output_parsing.wdl

Adds the following task:

tasks/species_typing/task_tbp_parser.wdl

This task runs the tbp-parser Python tool.

Inputs

Dirty reads

and

New optional inputs:

tbprofiler_output_seq_method_type -> tbp_parser_output_seq_method_type: default="WGS"
tbprofiler_operator -> tbp_parser_operator
tbp_parser_min_depth: default=10
tbp_parser_coverage_threshold: default=100
tbp_parser_debug: default=false
tbp_parser_docker_image: default="us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:0.0.5"

Outputs

Decontaminated reads, which are then passed to TBProfiler

and

New outputs:

File? tbprofiler_lims_report_csv -> tbp_parser_lims_report_csv
File? tbprofiler_looker_csv -> tbp_parser_looker_report_csv
File? tbprofiler_laboratorian_report_csv -> tbp_parser_laboratorian_report_csv
File? tbprofiler_resistance_genes_percent_coverage -> tbp_parser_coverage_report
Float? tbp_parser_genome_percent_coverage
String? tbp_parser_version
String? tbp_parser_docker

🧪 Testing

Locally

Terra

Successful run

🔬 Quality checks

Pull Request (PR) checklist:

Include a description of what is in this pull request in this message.
The workflow/task has been tested locally and on Terra
The CI/CD has been adjusted and tests are passing
Everything follows the style guide

cimendes · 2023-09-21T09:41:16Z

Tested with TheiaProk_Illumina_PE, TheiaProk_Illumina_SE and TheiaProk_ONT. The workflow for SE data is currently broken as clockwork is not set up to run with this type of data, causing as failure.

If clockwork is not compatible with single-end, I would add a conditional to only run is read2 is defined. I don't know if this is the best solution, maybe @sage-wright has some other alternative.

cimendes · 2023-09-21T10:06:12Z

Additional tests running TheiaProk_ONT and TheiaProk_Illumina_PE on the same dataset of samples (just different instruments) and while Illumina_PE worked without issues, the ONT failed for the same reason as Illumina_SE is failing (clockwork is not compatible with SE data)

…lic_health_bioinformatics into im-fja-tb-clockwork

cimendes · 2023-09-21T16:09:42Z

Retested on TheiaProk_ONT and everything looks good. Two samples failed due to not passing tbprofiler qc threshold, which is not surprising with ONT data. On TheiaProk_Illumina_SE everything passed with no issues, as expected! Approving! Great job everyone! 🌟

sage-wright and others added 15 commits September 11, 2023 20:17

add to theiaprok

b0c2a0e

add root

5c769b6

update docker, add spaces

cc2d611

update docker

a743fef

update container

0065d25

update docker

2f9e80a

Add preemptible, shorter version string (#185)

5227d77

update docker

8fcfefa

update docker

7d00801

update docker

51ddccd

Added clockwork task to theiaprok illumina pe

c924d88

64gb ram

2c5fee9

remove sam file at the end

9d4235e

v8

d617a9f

Merge branch 'smw-tbprofiler-dev' into im-fja-tb-clockwork

304d6ba

frankambrosio3 requested review from cimendes and sage-wright and removed request for cimendes September 19, 2023 21:12

update docker

ac4fd39

sage-wright mentioned this pull request Sep 20, 2023

Adding tbp-parser to TheiaProk #184

Closed

4 tasks

sage-wright changed the title ~~Im fja tb clockwork~~ Adding tbp_parser and clockwork to TheiaProk Sep 20, 2023

sage-wright and others added 3 commits September 20, 2023 16:07

apply style guidelines

c84ce51

apply style guidelines

7c1a4f1

update md5sum

759180c

cimendes and others added 3 commits September 21, 2023 12:11

only run clockwork is paired_end and not ont data

57e3ef6

try fix

751621b

update docker

e279d1d

cimendes added 3 commits September 21, 2023 14:59

add potential fix for the select_first null issue

d5cd4c7

Merge branch 'im-fja-tb-clockwork' of https://github.com/theiagen/pub…

9ddfabb

…lic_health_bioinformatics into im-fja-tb-clockwork

update md5sum

74d11fd

cimendes approved these changes Sep 21, 2023

View reviewed changes

cimendes merged commit df342b9 into main Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding tbp_parser and clockwork to TheiaProk #192

Adding tbp_parser and clockwork to TheiaProk #192

frankambrosio3 commented Sep 19, 2023 •

edited by cimendes

Loading

cimendes commented Sep 21, 2023

cimendes commented Sep 21, 2023 •

edited

Loading

cimendes commented Sep 21, 2023

Adding tbp_parser and clockwork to TheiaProk #192

Adding tbp_parser and clockwork to TheiaProk #192

Conversation

frankambrosio3 commented Sep 19, 2023 • edited by cimendes Loading

🛠️ Changes Being Made

🧠 Context and Rationale

📋 Workflow/Task Steps

Inputs

Outputs

🧪 Testing

Locally

Terra

🔬 Quality checks

cimendes commented Sep 21, 2023

cimendes commented Sep 21, 2023 • edited Loading

cimendes commented Sep 21, 2023

frankambrosio3 commented Sep 19, 2023 •

edited by cimendes

Loading

cimendes commented Sep 21, 2023 •

edited

Loading