-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add an action to search for run IDs using a text query #136
Conversation
Codecov Report
@@ Coverage Diff @@
## main #136 +/- ##
==========================================
- Coverage 98.58% 98.57% -0.01%
==========================================
Files 27 29 +2
Lines 2895 2955 +60
==========================================
+ Hits 2854 2913 +59
- Misses 41 42 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this neatly implemented addition @misialq 🥇.
I added my code comments inline. Could you also add a description of the new action to the ReadMe and maybe also the tutorial?
I tested the new get-ids-from-query
action with one query [1] and it worked like a charm. Also, fetching metadata for run, bioproject and study IDs still worked as expected [2].
[1] Command tested: qiime fondue get-ids-from-query --p-email my@mail.com --p-n-jobs 4 --p-query "human gut metagenome[Organism] AND public[Filter] AND (infant OR baby) " --o-ids ids-from-query.qza --verbose
[2] Tested run ID = ERR184806
, study ID = SRP132205
and bioproject ID = PRJEB16321
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the changes. this looks great to me 🚀
This PR adds a new action allowing retrieval of SRA run IDs using a text search query which will be executed on the BioSample database. The output of the action will then be a single
NCBIAccessionIDs
artifact that can be used to fetch corresponding sequences and/or metadata.Additionally, a little refactor is included in the Entrezpy pipelines submodule: the ESearch part is separated from the rest of the pipeline to allow fetching more than 10000 UIDs (particularly important for the action introduced in this PR) - see the expanded comments in the code.
Testing:
To test, you can run something like:
Please also verify that fetching metadata using run and aggregate IDs works correctly as these parts are influenced by changes in this PR (so, ideally: run, BioProject and study/experiment/sample IDs).
Note:
This PR should be merged after #138 - it already has its changes incorporated (otherwise the CI is broken). Also, it would be easier to review after the other one.