-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding tbp_parser and clockwork to TheiaProk #192
Conversation
Tested with TheiaProk_Illumina_PE, TheiaProk_Illumina_SE and TheiaProk_ONT. The workflow for SE data is currently broken as clockwork is not set up to run with this type of data, causing as failure. If clockwork is not compatible with single-end, I would add a conditional to only run is read2 is defined. I don't know if this is the best solution, maybe @sage-wright has some other alternative. |
Additional tests running TheiaProk_ONT and TheiaProk_Illumina_PE on the same dataset of samples (just different instruments) and while Illumina_PE worked without issues, the ONT failed for the same reason as Illumina_SE is failing (clockwork is not compatible with SE data) |
Retested on TheiaProk_ONT and everything looks good. Two samples failed due to not passing tbprofiler qc threshold, which is not surprising with ONT data. On TheiaProk_Illumina_SE everything passed with no issues, as expected! Approving! Great job everyone! 🌟 |
Closes #183 and closes #96
🛠️ Changes Being Made
And:
🧠 Context and Rationale
Mycobacteria are highly clonal and terribly difficult to distinguish using broadly generalizable bioinformatics techniques. Clockwork, specifically the CDC's Varpipe implementation of Clockwork, does a great job removing reads that map to ref genomes of similar species better than to the H37rv Mtb ref.
and
This is an adjustment & improvement to the very large single-task Python code that was exceeding 1000 lines. Unit tests have been implemented, functionality has been tested, many bugs squashed.
📋 Workflow/Task Steps
and
Removes the following tasks:
Adds the following task:
This task runs the tbp-parser Python tool.
Inputs
Dirty reads
and
New optional inputs:
tbprofiler_output_seq_method_type
->tbp_parser_output_seq_method_type
: default="WGS"tbprofiler_operator
->tbp_parser_operator
tbp_parser_min_depth
: default=10tbp_parser_coverage_threshold
: default=100tbp_parser_debug
: default=falsetbp_parser_docker_image
: default="us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:0.0.5"Outputs
Decontaminated reads, which are then passed to TBProfiler
and
New outputs:
File? tbprofiler_lims_report_csv
->tbp_parser_lims_report_csv
File? tbprofiler_looker_csv
->tbp_parser_looker_report_csv
File? tbprofiler_laboratorian_report_csv
->tbp_parser_laboratorian_report_csv
File? tbprofiler_resistance_genes_percent_coverage
->tbp_parser_coverage_report
Float? tbp_parser_genome_percent_coverage
String? tbp_parser_version
String? tbp_parser_docker
🧪 Testing
Locally
Terra
Successful run
🔬 Quality checks
Pull Request (PR) checklist: