Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tbp_parser and clockwork to TheiaProk #192

Merged
merged 25 commits into from
Sep 21, 2023
Merged

Conversation

frankambrosio3
Copy link
Contributor

@frankambrosio3 frankambrosio3 commented Sep 19, 2023

Closes #183 and closes #96

🛠️ Changes Being Made

  1. Added a task that uses the CDC Varpipe Docker image to run Clockwork for read decontamination.
  2. Added this task to merlin magic in TheiaProk_Illumina_PE_PHB to be triggered by GAMBIT taxon ID of "Mycobacterium tuberculosis"
  3. Added decontaminated reads as an optional output to TheiaProk_Illumina_PE_PHB

And:

🧠 Context and Rationale

Mycobacteria are highly clonal and terribly difficult to distinguish using broadly generalizable bioinformatics techniques. Clockwork, specifically the CDC's Varpipe implementation of Clockwork, does a great job removing reads that map to ref genomes of similar species better than to the H37rv Mtb ref.

and

This is an adjustment & improvement to the very large single-task Python code that was exceeding 1000 lines. Unit tests have been implemented, functionality has been tested, many bugs squashed.

📋 Workflow/Task Steps

  1. Map reads to all decon refs and H37rv
  2. Remove reads that have mapping score to non-H37rv genomes higher than to H37rv

and

Removes the following tasks:

  • tasks/species_typing/task_tb_gene_coverage.wdl
  • tasks/species_typing/task_tbprofiler_output_parsing.wdl

Adds the following task:

  • tasks/species_typing/task_tbp_parser.wdl

This task runs the tbp-parser Python tool.

Inputs

Dirty reads

and

New optional inputs:

  • tbprofiler_output_seq_method_type -> tbp_parser_output_seq_method_type: default="WGS"
  • tbprofiler_operator -> tbp_parser_operator
  • tbp_parser_min_depth: default=10
  • tbp_parser_coverage_threshold: default=100
  • tbp_parser_debug: default=false
  • tbp_parser_docker_image: default="us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:0.0.5"

Outputs

Decontaminated reads, which are then passed to TBProfiler

and

New outputs:

  • File? tbprofiler_lims_report_csv -> tbp_parser_lims_report_csv
  • File? tbprofiler_looker_csv -> tbp_parser_looker_report_csv
  • File? tbprofiler_laboratorian_report_csv -> tbp_parser_laboratorian_report_csv
  • File? tbprofiler_resistance_genes_percent_coverage -> tbp_parser_coverage_report
  • Float? tbp_parser_genome_percent_coverage
  • String? tbp_parser_version
  • String? tbp_parser_docker

🧪 Testing

Locally

Terra

Successful run

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

@frankambrosio3 frankambrosio3 requested review from cimendes and sage-wright and removed request for cimendes September 19, 2023 21:12
@sage-wright sage-wright mentioned this pull request Sep 20, 2023
4 tasks
@sage-wright sage-wright changed the title Im fja tb clockwork Adding tbp_parser and clockwork to TheiaProk Sep 20, 2023
@cimendes
Copy link
Member

Tested with TheiaProk_Illumina_PE, TheiaProk_Illumina_SE and TheiaProk_ONT. The workflow for SE data is currently broken as clockwork is not set up to run with this type of data, causing as failure.

If clockwork is not compatible with single-end, I would add a conditional to only run is read2 is defined. I don't know if this is the best solution, maybe @sage-wright has some other alternative.

@cimendes
Copy link
Member

cimendes commented Sep 21, 2023

Additional tests running TheiaProk_ONT and TheiaProk_Illumina_PE on the same dataset of samples (just different instruments) and while Illumina_PE worked without issues, the ONT failed for the same reason as Illumina_SE is failing (clockwork is not compatible with SE data)

@cimendes
Copy link
Member

Retested on TheiaProk_ONT and everything looks good. Two samples failed due to not passing tbprofiler qc threshold, which is not surprising with ONT data. On TheiaProk_Illumina_SE everything passed with no issues, as expected! Approving! Great job everyone! 🌟

@cimendes cimendes merged commit df342b9 into main Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add clockwork to TBProfiler CDPH TbProfiler expert rule implementation - v2
4 participants