ENH: add an action to search for run IDs using a text query #136

misialq · 2022-07-29T09:43:59Z

This PR adds a new action allowing retrieval of SRA run IDs using a text search query which will be executed on the BioSample database. The output of the action will then be a single NCBIAccessionIDs artifact that can be used to fetch corresponding sequences and/or metadata.

Additionally, a little refactor is included in the Entrezpy pipelines submodule: the ESearch part is separated from the rest of the pipeline to allow fetching more than 10000 UIDs (particularly important for the action introduced in this PR) - see the expanded comments in the code.

Testing:
To test, you can run something like:

qiime fondue get-ids-from-query --p-email <your email> --p-n-jobs 4 --p-query "txid410656[Organism] AND \"public\"[Filter] AND (chicken OR poultry)" --o-ids ids-from-query.qza --verbose

Please also verify that fetching metadata using run and aggregate IDs works correctly as these parts are influenced by changes in this PR (so, ideally: run, BioProject and study/experiment/sample IDs).

Note:
This PR should be merged after #138 - it already has its changes incorporated (otherwise the CI is broken). Also, it would be easier to review after the other one.

codecov · 2022-07-29T09:47:46Z

Codecov Report

Merging #136 (1efba49) into main (4d9fc11) will decrease coverage by 0.00%.
The diff coverage is 98.73%.

❗ Current head 1efba49 differs from pull request most recent head 88d2680. Consider uploading reports for the commit 88d2680 to get more accurate results

@@            Coverage Diff             @@
##             main     #136      +/-   ##
==========================================
- Coverage   98.58%   98.57%   -0.01%     
==========================================
  Files          27       29       +2     
  Lines        2895     2955      +60     
==========================================
+ Hits         2854     2913      +59     
- Misses         41       42       +1

Impacted Files	Coverage Δ
q2_fondue/metadata.py	`98.63% <ø> (ø)`
q2_fondue/sequences.py	`98.55% <ø> (ø)`
q2_fondue/tests/test_sequences.py	`98.09% <ø> (ø)`
q2_fondue/tests/test_query.py	`94.11% <94.11%> (ø)`
q2_fondue/entrezpy_clients/_esearch.py	`97.77% <100.00%> (+0.05%)`	⬆️
q2_fondue/entrezpy_clients/_pipelines.py	`100.00% <100.00%> (ø)`
q2_fondue/plugin_setup.py	`100.00% <100.00%> (ø)`
q2_fondue/query.py	`100.00% <100.00%> (ø)`
q2_fondue/tests/test_metadata.py	`99.68% <100.00%> (+0.01%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

adamovanja

Thanks for this neatly implemented addition @misialq 🥇.

I added my code comments inline. Could you also add a description of the new action to the ReadMe and maybe also the tutorial?

I tested the new get-ids-from-query action with one query [1] and it worked like a charm. Also, fetching metadata for run, bioproject and study IDs still worked as expected [2].

[1] Command tested: qiime fondue get-ids-from-query --p-email my@mail.com --p-n-jobs 4 --p-query "human gut metagenome[Organism] AND public[Filter] AND (infant OR baby) " --o-ids ids-from-query.qza --verbose

[2] Tested run ID = ERR184806, study ID = SRP132205 and bioproject ID = PRJEB16321

q2_fondue/entrezpy_clients/_esearch.py

q2_fondue/entrezpy_clients/_pipelines.py

q2_fondue/plugin_setup.py

adamovanja

thanks for the changes. this looks great to me 🚀

misialq marked this pull request as ready for review August 17, 2022 12:42

misialq requested review from adamovanja and lina-kim August 17, 2022 12:42

adamovanja requested changes Aug 18, 2022

View reviewed changes

misialq added 8 commits August 18, 2022 16:41

ENH: add an action to search for run IDs using a text query

bb71e89

Add query test

487f1c2

Remove the http request files

d40af43

Add missing test case

75ebdb5

Add more comments

d139776

Adjust action input descriptions

3b7a2e6

Fix fetching run ids from studies and alike

9ca7d39

Review suggestions

1efba49

misialq force-pushed the esearch-test branch from 7b3ac90 to 1efba49 Compare August 18, 2022 14:44

misialq requested a review from adamovanja August 18, 2022 14:46

adamovanja approved these changes Aug 19, 2022

View reviewed changes

Update README

88d2680

misialq merged commit 5afa120 into bokulich-lab:main Aug 19, 2022

misialq deleted the esearch-test branch August 19, 2022 08:43

misialq mentioned this pull request Sep 5, 2022

Fetch SRA IDs based on a search query #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add an action to search for run IDs using a text query #136

ENH: add an action to search for run IDs using a text query #136

misialq commented Jul 29, 2022 •

edited

Loading

codecov bot commented Jul 29, 2022 •

edited

Loading

adamovanja left a comment

adamovanja left a comment

ENH: add an action to search for run IDs using a text query #136

ENH: add an action to search for run IDs using a text query #136

Conversation

misialq commented Jul 29, 2022 • edited Loading

codecov bot commented Jul 29, 2022 • edited Loading

Codecov Report

adamovanja left a comment

Choose a reason for hiding this comment

adamovanja left a comment

Choose a reason for hiding this comment

misialq commented Jul 29, 2022 •

edited

Loading

codecov bot commented Jul 29, 2022 •

edited

Loading