-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IRMA bug fixes & improvements; theiacov_illumina_pe wf updates for Flu #468
Conversation
β¦appropriate disk space prior to assembly
β¦ image on our GAR. tested successfully w miniwdl
β¦ seeing disk space, added many comments throughout, added line to insert samplename into all-segment FASTA file headers, adjusted loop for file renaming to use mv verbose flag instead, fixed cmds for renaming of HA FASTA which was previously resulting in 0 byte FASTA for Flu B HA segment, updated renaming of NA FASTA file, updated sed command for inserting samplename into NA FASTA file, updated if/elif/else block for printing subtype. ran successfully with 2 Flu A samples and 1 Flu B sample
β¦pe predicted by IRMA, added new string output to state that IRMA doesn't differentiate btwn yamagata and victoria, added mv -v flag for renaming BAM files that have subtype included in filename. tested successfully w miniwdl with 1 Flu A and 1 Flu B
β¦ypes_notes to theiacov wf outputs
β¦. Added these 3 outputs as well as the remaining segment assembly files
β¦m abricate instead of IRMA for determining flu nextclade values
β¦A unless it cannot predict subtype, then use abricate flu subtype instead
β¦fixed version capture bug & the usage of mafft --thread flag
β¦ment for custom config settings; added copies of output assemblies (all segs and individual segs) with periods replaced by Ns; tested successfully with Flu A and B samples
β¦al_substitutions task and VADR task
β¦utions task and VADR task; deleted 2 duplicate outputs for NA and HA FASTA files
# TODO test again and look at .pad.fa files | ||
#echo "ALIGN_AMENDED=1" >> irma_config.sh | ||
#echo "ASSEM_REF=1" >> irma_config.sh | ||
#echo "PADDED_CONSENSUS=1" >> irma_config.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving these lines commented out in case we want to revisit in the future when working on this task
OK PR is ready for review. I'll update the documentation and check the above box when I'm finished. @sage-wright FYI there was one additional commit made after you forked your branch, you may want to merge that in if you are working on theiacov_ont workflow |
docs have been updated π (mostly updates to theiacov inputs and outputs table) |
β¦ to make VADR happy
β¦larified comment lines
β¦not dashes in the all segment FASTA file
I relaunched Sage's tests since they were run prior to the last 2 commits. Same samples, call caching off, both run on the
|
Code changes look solid! π New outputs present as expected: Launching a new set of tests as I don't have access to Sage's workspace:
|
My tests were successful and the workflow is working as expected. @jrotieno you're the main reviewer here but you got my okay! |
Starting a draft while doing testing in Terra. Will update this message periodicallyThis PR closes #412 closes #437 and closes #457
ποΈ This dev branch should be deleted after merging to main.
π§ Aim, Context and Functionality
The aim of this PR is to resolve a few different issues/bugs and make general improvements & upgrades to the TheiaCoV workflows for Flu analysis. Much of these changes impact the IRMA task, used in TheiaCoV_Illumina_PE wf, but some changes impact other workflows like TheiaCoV_ONT for Flu analysis.
π οΈ Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users donβt change any workflow inputs relative to the last version : Yes
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
π Workflow/Task Step Changes
π Data Processing
DEBUG
statements throughoutirma_config.sh
config file by setting$TMP
used by IRMA to the present working directory. This impacted samples where FASTQ files were large and is no longer an issueDocker/software or software versions changed:
cdcgov/irma:v1.1.3
β‘οΈ"us-docker.pkg.dev/general-theiagen/cdcgov/irma:v1.1.5"
Databases or database versions changed: N/A
Data processing/commands changed: lots of changes to IRMA task & data processing
mafft --thread
flag and updated how version is captured--external-config irma_config.sh
toIRMA
command to use custom config file created in beginning of task.File processing changed: lots of changes to output files & file renaming & FASTA header replacement
~{samplename}
sed
command to modify file in place instead of redirecting into file.Compute resources changed:
4
irma_config.sh
file created in task to include 2 variables for multi-threading IRMAβ‘οΈ Inputs
β¬ οΈ Outputs
New IRMA task outputs:
seg_np_assembly
&seq_ns_assembly
which are the FASTA files corresponding with the NP (Nucleoprotein) and NS (nonstructural) segmentString irma_docker
String irma_subtype_notes
which is either:IRMA does not differentiate Victoria and Yamagata Flu B lineages. See abricate_flu_subtype output column
for Flu B samples and blank/empty for non Flu B samples.String irma_subtype
will either sayH1N1
(or whatever Flu A subtype) for example with Flu A and for Flu B it will sayNo subtype predicted by IRMA
.
have been replaced withN
's.New TheiaCoV_Illumina_PE & ONT outputs:
String irma_docker = irma.irma_docker
String? irma_subtype_notes = irma.irma_subtype_notes
File? irma_ha_segment_fasta = irma.seg_ha_assembly
File? irma_na_segment_fasta = irma.seg_na_assembly
File? irma_pa_segment_fasta = irma.seg_pa_assembly
File? irma_pb1_segment_fasta = irma.seg_pb1_assembly
File? irma_pb2_segment_fasta = irma.seg_pb2_assembly
File? irma_mp_segment_fasta = irma.seg_mp_assembly
File? irma_np_segment_fasta = irma.seg_np_assembly
File? irma_ns_segment_fasta = irma.seg_ns_assembly
π§ͺ Testing
Test Dataset
Described below.
Commandline Testing with MiniWDL or Cromwell (optional)
tested lots with miniwdl locally, but don't have output saved.
Terra Testing
Suggested Scenarios for Reviewer to Test
Would be good to test as many types/subtypes as possible and even test with poor quality data to see how the workflow behaves when IRMA cannot produce an assembly.
Need to test ONT as I've primarily been testing the IRMA WDL task updates with Illumina data. I tested 5 ONT samples, but would be good to do more if data is available
Theiagen Version Release Testing (optional)
π¬ Final Developer Checklist
π― Reviewer Checklist
ποΈ Associated Documentation (to be completed by Theiagen developer)