-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Snippy_Variants QC outputs to Snippy_Tree and Snippy_Sreamline workflow outputs #592
Conversation
Please confirm that any documentation updates you made were incorporated into the new docs, and if not, could you please add them?? Thanks! |
Summary of updates The branch was merged with main branch to allow docs to be updated In snippy_tree.wdl Updated Outputs Section in snippy_streamline and snippy_variantss Documentation: Added snippy_combined_qc_metrics: In snippy_variants.wdl Updated code to handle Insufficient Data: Added a condition to handle cases where the coverage file may be empty or have insufficient data. Ensured percent_reads_aligned in qc output for individual sample and conmbined qc metric tsv file Sample-Level Summary TableBelow is an example of the combined QC metrics output:
See terra output for combined qc metrics here - https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/6b595a20-3ed0-40e3-889d-655dc70d07c8/e99c632b-e07a-4afa-a46c-86726144d72d
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Could you please:
- Add the QC TSV output to Snippy_Streamline_FASTA as well?
- If so, add the output to the Snippy_Streamline_FASTA documentation
- Add the output description in the Snippy_Variants documentation (snippy_variants.md)
I also want to hear back from Andrew on line 89 in task_snippy_variants.wdl before making a final review!
Great work!
Updated with changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐
This PR closes #353
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
This workflow adds the following Snippy_Variants QC output metrics to the Snippy_Tree and Snippy_Streamline workflows. The are useful in assessing the quality of samples included in the phylogenetic tree as well as the alignment quality.
⚡ Impacted Workflows/Tasks
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
⚙️ Algorithm
The snippy variants produces the following QC columns:
Int snippy_variants_num_reads_aligned
Int snippy_variants_num_variants
File snippy_variants_coverage_tsv
Float snippy_variants_percent_ref_coverage
The goal was to have all these values on a single line per sample included in Snippy_Tree of Snippy_Streamline.
The
snippy_variants_coverage_tsv
file contains contents like:And therefore, a typical output of all the Snippy_Variants QC metrics would be like:
However, for a pathogen like V. cholerae with two chromosomes, the
snippy_variants_coverage_tsv
output is like:In such cases, the mapping information for the second chromosomes would be appended after those for the first chromosome, with the implementation capable of taking care of as many chromosomes as there are in the reference fasta file used for read mapping:
As the Snippy_Tree and Snippy_Streamline workflows are set level, the QC results are combined into a single file with each sample per row in the output TSV file. An example below, allowing for comparisons across samples:
➡️ Inputs
⬅️ Outputs
A new output file
snippy_variants_qc_metrics
for the Snippy_Variants workflow andsnippy_combined_qc_metrics
for the Snippy_Tree and Snippy_Streamline workflows🧪 Testing
Snippy_Variants: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/2d851d39-78cd-444f-952d-60ab77e7db77
Snippy_Streamline: https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/bfd8044f-1cfa-4945-bc09-383cfa6bc2d8
Suggested Scenarios for Reviewer to Test
Single chromosome pathogen
🔬 Final Developer Checklist
🎯 Reviewer Checklist