-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: correct E-Direct param setting when fetching metadata #135
Conversation
Codecov Report
@@ Coverage Diff @@
## main #135 +/- ##
==========================================
+ Coverage 95.62% 95.71% +0.08%
==========================================
Files 15 14 -1
Lines 1165 1096 -69
Branches 216 213 -3
==========================================
- Hits 1114 1049 -65
+ Misses 26 22 -4
Partials 25 25
Help us with your feedback. Take ten seconds to tell us how you rate us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @misialq for clean refactor 🚀. I added some comments inline. Also, I tested it on the BioProject ID PRJEB11419
that is linked with 39'504 run IDs and it fetched all the metadata correctly in 1:20 hrs 🎉 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. Could you remove issue #91 from the PR description? Then we are ready to merge 😊.
This PR looks great to me. I'm glad you found this clean solution 🥇.
This PR changes how we are using some of the
entrezpy
functionalities in order to make better use of their built-in ways of handling E-Direct pipelines and multi-threaded requests.The big change is in the inheritance of our ESearch/EFetch classes (both
Result
andAnalyzer
): instead of inheriting fromEutilsResult
/EutilsAnalyzer
, they will now inherit fromEsearchResult
/EsearchAnalyzer
/EfetchAnalyzer
(please mind thatEFetchResult
still inherits from theEutilsResult
as most of the functionality needs to be implemented from scratch there). The reason for this change is that those "new" base classes provide more handy functionalities out-of-the-box, allowing us to better/easier use pipelines and automatically handle large requests (> 10000 run IDs) without the need for batching/looping/other workarounds.To test this change, try fetching metadata with
get-metadata
using any BioProject ID and a list of run IDs. Importantly, those should contain more than 10000 (run) IDs to check that the "new" batching works properly.Closes #132.