Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update BUSCO to v5.7.1 and small tweaks to WDL task #401

Merged
merged 5 commits into from
Apr 9, 2024

Conversation

kapsakcj
Copy link
Contributor

@kapsakcj kapsakcj commented Apr 4, 2024

This PR closes #345

🗑️ This dev branch should be deleted after merging to main.

🧠 Aim, Context and Functionality

Update BUSCO to the latest available version

🛠️ Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes, BUSCO task auto-downloads their database at runtime and it is periodically updated (not sure how often but last update for enterobacteriales db was 2024-01-08

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes

Impacted workflows:

  • All TheiaProk workflows (ILMN PE, ILMN SE, ONT, FASTA)
  • TheiaEuk_illumina_pe_phb

📋 Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed: upgraded to use a Theiagen-hosted copy of the ezlabgva (authors) docker image us-docker.pkg.dev/general-theiagen/ezlabgva/busco:v5.7.1_cv1

Databases or database versions changed: Database changes without warning

Data processing/commands changed: added -cpu option to main busco command

File processing changed: adjustments to parsing of output files; see code for details

Compute resources changed: none

➡️ Inputs

⬅️ Outputs

Added String busco_docker output to WDL task

TODO:

  • add busco_docker output to all impacted workflows
  • theiaprok_illumina_pe
  • theiaprok_illumina_se
  • theiaprok_illumina_fasta
  • theiaprok_ont
  • theiaeuk_illumina_pe

🧪 Testing

Test Dataset

Will update later, but will likely test across a diverse set of bacterial species and at least one eukaryotic pathogen (candida auris?)

image

Commandline Testing with MiniWDL or Cromwell (optional)

Tested the WDL task changes locally:

2024-04-04 17:12:04.718 wdl.t:busco done
2024-04-04 17:12:04.719 miniwdl-run.CallCache call cache insert :: cache_file: "/home/curtis_kapsak/.cache/miniwdl/busco/x3vxz4e57qtwr25ajw5hutw77eakiukd/vbsofqftg2typeltmjrgqa5cvw26m27k.json"
{
  "outputs": {
    "busco.busco_report": "/home/curtis_kapsak/github/public_health_bioinformatics/20240404_170838_busco/out/busco_report/03-98DDCS_busco-summary.txt",
    "busco.busco_results": "C:99.8%[S:99.3%,D:0.5%],F:0.2%,M:0.0%,n:440",
    "busco.busco_database": "enterobacterales_odb10 (2024-01-08)",
    "busco.busco_docker": "us-docker.pkg.dev/general-theiagen/ezlabgva/busco:v5.7.1_cv1",
    "busco.busco_version": "BUSCO 5.7.1"
  },
  "dir": "/home/curtis_kapsak/github/public_health_bioinformatics/20240404_170838_busco"
}

Will test workflows in Terra after code has been updated

Terra Testing

Suggested Scenarios for Reviewer to Test

Theiagen Version Release Testing (optional)

🔬 Final Developer Checklist

  • The workflow/task has been tested locally and results, including file contents, are as anticipated
  • The workflow/task has been tested on Terra and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (to be completed by Theiagen developer)
  • Code changes follow the style guide

🎯 Reviewer Checklist

  • All impacted workflows/tasks have been tested on Terra with a different dataset than used for development
  • All reviewer-suggested scenarios have been tested and any additional
  • All changed results have been confirmed to be accurate
  • All workflows/tasks impacted by change/s have been tested using a standard validation dataset to ensure no unintended change of functionality
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments

🗂️ Associated Documentation (to be completed by Theiagen developer)

  • Relevant documentation on the Public Health Resources "PHB Main" has been updated
  • Workflow diagrams have been updated to reflect changes

kapsakcj added 4 commits April 4, 2024 17:13
…pdated parsing code to account for adjustments to BUSCO final output summary txt file; added docker as String output. tested successfully w miniwdl
@kevinlibuit kevinlibuit requested review from cimendes and kevinlibuit and removed request for cimendes April 8, 2024 21:35
@kevinlibuit
Copy link
Contributor

@kapsakcj any hesitation in taking this out of draft state? Changes are looking pretty solid to me.

@kapsakcj
Copy link
Contributor Author

kapsakcj commented Apr 9, 2024

I can mark it ready for review, but I haven't finished testing & reviewing outputs. Only ran TheiaProk_FASTA workflow linked above, haven't tested the other workflows yet.

I would recommend testing TheiaEuk to confirm it still works as intended for eukaryotes before merging.

@kapsakcj kapsakcj marked this pull request as ready for review April 9, 2024 01:07
@cimendes
Copy link
Member

cimendes commented Apr 9, 2024

@kapsakcj BUSCO keeps failing on TheiaEuk 😢
It "fails successfully" so it's hard for me to understand why it's so unhappy. I'll try to dig a bit and I shall report back!

@cimendes
Copy link
Member

cimendes commented Apr 9, 2024

I just did a retry on the workflow for theiaeuk, setting the memory for 16GB -> https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/98834755-7d1c-43a3-aab3-fc0e66c53c03

…k_illumina_pe workflow to account for higher RAM required for larger genomes
@kapsakcj
Copy link
Contributor Author

kapsakcj commented Apr 9, 2024

Testing TheiaEuk with 3 Candida auris genomes here, now that the default RAM is set to 24GB for TheiaEuk specifically: https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_nextcladeV3testing/job_history/b03b3a98-99a5-443e-b15a-9fd879f56b6d

@kapsakcj
Copy link
Contributor Author

kapsakcj commented Apr 9, 2024

BUSCO ran successfully (without memory failure) with the new default of 24GB.

I think we are good to merge?

@kevinlibuit kevinlibuit merged commit bf5a5a3 into main Apr 9, 2024
12 checks passed
@kapsakcj kapsakcj deleted the cjk-busco-update branch April 9, 2024 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUSCO] Update to latest release - v5.6.1
3 participants