Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add an action to search for run IDs using a text query #136

Merged
merged 9 commits into from
Aug 19, 2022

Conversation

misialq
Copy link
Collaborator

@misialq misialq commented Jul 29, 2022

This PR adds a new action allowing retrieval of SRA run IDs using a text search query which will be executed on the BioSample database. The output of the action will then be a single NCBIAccessionIDs artifact that can be used to fetch corresponding sequences and/or metadata.

Additionally, a little refactor is included in the Entrezpy pipelines submodule: the ESearch part is separated from the rest of the pipeline to allow fetching more than 10000 UIDs (particularly important for the action introduced in this PR) - see the expanded comments in the code.

Testing:
To test, you can run something like:

qiime fondue get-ids-from-query --p-email <your email> --p-n-jobs 4 --p-query "txid410656[Organism] AND \"public\"[Filter] AND (chicken OR poultry)" --o-ids ids-from-query.qza --verbose 

Please also verify that fetching metadata using run and aggregate IDs works correctly as these parts are influenced by changes in this PR (so, ideally: run, BioProject and study/experiment/sample IDs).

Note:
This PR should be merged after #138 - it already has its changes incorporated (otherwise the CI is broken). Also, it would be easier to review after the other one.

@codecov
Copy link

codecov bot commented Jul 29, 2022

Codecov Report

Merging #136 (1efba49) into main (4d9fc11) will decrease coverage by 0.00%.
The diff coverage is 98.73%.

❗ Current head 1efba49 differs from pull request most recent head 88d2680. Consider uploading reports for the commit 88d2680 to get more accurate results

@@            Coverage Diff             @@
##             main     #136      +/-   ##
==========================================
- Coverage   98.58%   98.57%   -0.01%     
==========================================
  Files          27       29       +2     
  Lines        2895     2955      +60     
==========================================
+ Hits         2854     2913      +59     
- Misses         41       42       +1     
Impacted Files Coverage Δ
q2_fondue/metadata.py 98.63% <ø> (ø)
q2_fondue/sequences.py 98.55% <ø> (ø)
q2_fondue/tests/test_sequences.py 98.09% <ø> (ø)
q2_fondue/tests/test_query.py 94.11% <94.11%> (ø)
q2_fondue/entrezpy_clients/_esearch.py 97.77% <100.00%> (+0.05%) ⬆️
q2_fondue/entrezpy_clients/_pipelines.py 100.00% <100.00%> (ø)
q2_fondue/plugin_setup.py 100.00% <100.00%> (ø)
q2_fondue/query.py 100.00% <100.00%> (ø)
q2_fondue/tests/test_metadata.py 99.68% <100.00%> (+0.01%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@misialq misialq marked this pull request as ready for review August 17, 2022 12:42
@misialq misialq requested review from adamovanja and lina-kim August 17, 2022 12:42
Copy link
Contributor

@adamovanja adamovanja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this neatly implemented addition @misialq 🥇.

I added my code comments inline. Could you also add a description of the new action to the ReadMe and maybe also the tutorial?

I tested the new get-ids-from-query action with one query [1] and it worked like a charm. Also, fetching metadata for run, bioproject and study IDs still worked as expected [2].


[1] Command tested: qiime fondue get-ids-from-query --p-email my@mail.com --p-n-jobs 4 --p-query "human gut metagenome[Organism] AND public[Filter] AND (infant OR baby) " --o-ids ids-from-query.qza --verbose

[2] Tested run ID = ERR184806, study ID = SRP132205 and bioproject ID = PRJEB16321

q2_fondue/entrezpy_clients/_esearch.py Show resolved Hide resolved
q2_fondue/entrezpy_clients/_pipelines.py Outdated Show resolved Hide resolved
q2_fondue/entrezpy_clients/_pipelines.py Show resolved Hide resolved
q2_fondue/entrezpy_clients/_pipelines.py Outdated Show resolved Hide resolved
q2_fondue/plugin_setup.py Outdated Show resolved Hide resolved
q2_fondue/plugin_setup.py Outdated Show resolved Hide resolved
q2_fondue/plugin_setup.py Outdated Show resolved Hide resolved
q2_fondue/plugin_setup.py Outdated Show resolved Hide resolved
q2_fondue/plugin_setup.py Outdated Show resolved Hide resolved
Copy link
Contributor

@adamovanja adamovanja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the changes. this looks great to me 🚀

@misialq misialq merged commit 5afa120 into bokulich-lab:main Aug 19, 2022
@misialq misialq deleted the esearch-test branch August 19, 2022 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants