-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update BUSCO to v5.7.1 and small tweaks to WDL task #401
Conversation
…pdated parsing code to account for adjustments to BUSCO final output summary txt file; added docker as String output. tested successfully w miniwdl
…mina_pe. have not tested yet
@kapsakcj any hesitation in taking this out of draft state? Changes are looking pretty solid to me. |
I can mark it ready for review, but I haven't finished testing & reviewing outputs. Only ran TheiaProk_FASTA workflow linked above, haven't tested the other workflows yet. I would recommend testing TheiaEuk to confirm it still works as intended for eukaryotes before merging. |
@kapsakcj BUSCO keeps failing on TheiaEuk 😢 |
I just did a retry on the workflow for theiaeuk, setting the memory for 16GB -> https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/98834755-7d1c-43a3-aab3-fc0e66c53c03 |
…k_illumina_pe workflow to account for higher RAM required for larger genomes
Testing TheiaEuk with 3 Candida auris genomes here, now that the default RAM is set to 24GB for TheiaEuk specifically: https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history/b03b3a98-99a5-443e-b15a-9fd879f56b6d |
BUSCO ran successfully (without memory failure) with the new default of 24GB. I think we are good to merge? |
This PR closes #345
🗑️ This dev branch should be deleted after merging to main.
🧠 Aim, Context and Functionality
Update BUSCO to the latest available version
🛠️ Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes, BUSCO task auto-downloads their database at runtime and it is periodically updated (not sure how often but last update for enterobacteriales db was 2024-01-08
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes
Impacted workflows:
📋 Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: upgraded to use a Theiagen-hosted copy of the ezlabgva (authors) docker image
us-docker.pkg.dev/general-theiagen/ezlabgva/busco:v5.7.1_cv1
Databases or database versions changed: Database changes without warning
Data processing/commands changed: added
-cpu
option to mainbusco
commandFile processing changed: adjustments to parsing of output files; see code for details
Compute resources changed: none
➡️ Inputs
⬅️ Outputs
Added
String busco_docker
output to WDL taskTODO:
🧪 Testing
Test Dataset
Will update later, but will likely test across a diverse set of bacterial species and at least one eukaryotic pathogen (candida auris?)
Commandline Testing with MiniWDL or Cromwell (optional)
Tested the WDL task changes locally:
Will test workflows in Terra after code has been updated
Terra Testing
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
🔬 Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)