Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[theiaprok] amrfinderplus: add support for Vibrio parahaemolyticus, Vibrio vulnificus, Enterobacter asburiae. Fix C diff bug #542

Merged
merged 3 commits into from
Jul 15, 2024

Conversation

kapsakcj
Copy link
Contributor

@kapsakcj kapsakcj commented Jul 12, 2024

This PR closes #129

🗑️ This dev branch should be deleted after merging to main.

🧠 Aim, Context and Functionality

This PR adds support for using the amrfinder --organism <organism_name> option for 3 new organisms (Vv, Vp, and Enterobacter asburiae) and corrects a typo for C. diff.

These are organisms that were recently added to NCBI AMRFinderPlus' list of supported organisms that have taxon-specific point mutations or other organism-specific handling of annotation: https://github.com/ncbi/amr/wiki/Running-AMRFinderPlus#--organism-option

When users analyze these samples via the TheiaProk workflows (or the standalone amrfinderplus workflow), AMRFinderPlus will now use the --organism flag and be able to detect organism-specific point mutations.

The usage of the amrfinder --organism option depends on GAMBIT accurately predicting the genus and species correctly, but the user also has the ability to override GAMBIT's results for gambit_predicted_taxon by manually entering an optional String input expected_taxon for the various theiaprok workflows.

⚠️ This manual override is currently required for C. diff as GAMBIT with the 1.3.0 database does not have the ability to predict Clostridioides difficile as it is not included in the database (AFAIK)

🛠️ Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

Workflows impacted:

  • theiaprok_illumina_pe
  • theiaprok_illumina_se
  • theiaprok_ONT
  • theiaprok_FASTA
  • NCBI-AMRFinderPlus standalone wf

📋 Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed: N/A

Databases or database versions changed: N/A

Data processing/commands changed: added use of --organism flag for 3 organism, fixed for 1 (C. diff)

File processing changed: N/A

Compute resources changed: N/A

➡️ Inputs

N/A

⬅️ Outputs

Outputs may now include organism-specific point mutations if detected by amrfinderplus.

🧪 Testing

Test Dataset

I tested with either assemblies or Illumina PE datasets for samples that are known to have organism-specific point mutations as reported by NCBI pathogen detection browser.

I also tested with my full amrfinderplus testing dataset that includes diverse taxa, most of which have organism-specific point mutations/acquired genes, to ensure functionality is not lost for these datasets.

Commandline Testing with MiniWDL or Cromwell (optional)

Tested the amrfinderplus task successfully on the command line, but not including output here.

Terra Testing

Will add tests once they are complete

Suggested Scenarios for Reviewer to Test

Test any of the TheiaProk workflows with data from these organisms, or others, to ensure new functionality is working as intended and other organisms are unimpacted by this change

Theiagen Version Release Testing (optional)

  • Will changes require functional or validation testing (checking outputs etc) during the release? No
  • Do new samples need to be added to validation datasets? If so, upload these to the appropriate validation workspace Google bucket (). Please describe the new samples here and why these have been chosen. We could add the datasets that I've tested with to the validation datasets, though it's not critically important to do.
  • Are there any output files that should be checked after running the version release testing? No

🔬 Final Developer Checklist

  • The workflow/task has been tested locally and results, including file contents, are as anticipated
  • The workflow/task has been tested on Terra and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (to be completed by Theiagen developer)
  • Code changes follow the style guide

🎯 Reviewer Checklist

  • All impacted workflows/tasks have been tested on Terra with a different dataset than used for development
  • All reviewer-suggested scenarios have been tested and any additional
  • All changed results have been confirmed to be accurate
  • All workflows/tasks impacted by change/s have been tested using a standard validation dataset to ensure no unintended change of functionality
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments

🗂️ Associated Documentation (to be completed by Theiagen developer)

  • Relevant documentation on the Public Health Resources "PHB Main" has been updated
  • Workflow diagrams have been updated to reflect changes

kapsakcj added 3 commits July 12, 2024 17:08
…s, correct typo for Cdiff organism option. tested successfully w miniwdl
…n Markdown when html comment indicators are deleted by PR author
@kapsakcj kapsakcj marked this pull request as ready for review July 15, 2024 16:35
@kapsakcj kapsakcj requested a review from sage-wright July 15, 2024 16:35
@kapsakcj
Copy link
Contributor Author

FYI I've checked over the documentation and diagrams and there's nothing to update. Would be good to announce to users upon the next version release

Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic, as always. Well done, Curtis!

@sage-wright sage-wright merged commit 42a06ef into main Jul 15, 2024
7 checks passed
@sage-wright sage-wright deleted the cjk-amrfinderplus-orgs branch July 15, 2024 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TheiaProk] add support for Enterobacter_asburiae, Vp, and Vv for NCBI amrfinder task; update default docker
2 participants