Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: correct E-Direct param setting when fetching metadata #135

Merged
merged 6 commits into from
Jul 29, 2022

Conversation

misialq
Copy link
Collaborator

@misialq misialq commented Jul 28, 2022

This PR changes how we are using some of the entrezpy functionalities in order to make better use of their built-in ways of handling E-Direct pipelines and multi-threaded requests.

The big change is in the inheritance of our ESearch/EFetch classes (both Result and Analyzer): instead of inheriting from EutilsResult/EutilsAnalyzer, they will now inherit from EsearchResult/EsearchAnalyzer/EfetchAnalyzer (please mind that EFetchResult still inherits from the EutilsResult as most of the functionality needs to be implemented from scratch there). The reason for this change is that those "new" base classes provide more handy functionalities out-of-the-box, allowing us to better/easier use pipelines and automatically handle large requests (> 10000 run IDs) without the need for batching/looping/other workarounds.

To test this change, try fetching metadata with get-metadata using any BioProject ID and a list of run IDs. Importantly, those should contain more than 10000 (run) IDs to check that the "new" batching works properly.

Closes #132.

@codecov
Copy link

codecov bot commented Jul 28, 2022

Codecov Report

Merging #135 (7998f21) into main (89a59dd) will increase coverage by 0.08%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
+ Coverage   95.62%   95.71%   +0.08%     
==========================================
  Files          15       14       -1     
  Lines        1165     1096      -69     
  Branches      216      213       -3     
==========================================
- Hits         1114     1049      -65     
+ Misses         26       22       -4     
  Partials       25       25              
Impacted Files Coverage Δ
q2_fondue/utils.py 92.72% <ø> (ø)
q2_fondue/metadata.py 96.96% <86.66%> (-3.04%) ⬇️
q2_fondue/entrezpy_clients/_esearch.py 95.45% <93.33%> (+1.25%) ⬆️
q2_fondue/entrezpy_clients/_efetch.py 89.91% <100.00%> (-0.05%) ⬇️
q2_fondue/entrezpy_clients/_pipelines.py 100.00% <100.00%> (ø)
q2_fondue/sequences.py 96.15% <100.00%> (-0.04%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us.

q2_fondue/sequences.py Outdated Show resolved Hide resolved
@misialq misialq linked an issue Jul 28, 2022 that may be closed by this pull request
@misialq misialq requested review from adamovanja and lina-kim July 28, 2022 09:41
@misialq misialq marked this pull request as ready for review July 28, 2022 13:02
Copy link
Contributor

@adamovanja adamovanja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @misialq for clean refactor 🚀. I added some comments inline. Also, I tested it on the BioProject ID PRJEB11419 that is linked with 39'504 run IDs and it fetched all the metadata correctly in 1:20 hrs 🎉 .

q2_fondue/entrezpy_clients/_esearch.py Outdated Show resolved Hide resolved
q2_fondue/entrezpy_clients/_pipelines.py Show resolved Hide resolved
q2_fondue/metadata.py Show resolved Hide resolved
q2_fondue/metadata.py Show resolved Hide resolved
q2_fondue/metadata.py Show resolved Hide resolved
q2_fondue/sequences.py Outdated Show resolved Hide resolved
q2_fondue/tests/test_sequences.py Outdated Show resolved Hide resolved
Copy link
Contributor

@adamovanja adamovanja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. Could you remove issue #91 from the PR description? Then we are ready to merge 😊.

This PR looks great to me. I'm glad you found this clean solution 🥇.

@misialq misialq merged commit 5d2f02b into bokulich-lab:main Jul 29, 2022
@misialq misialq deleted the issue-132-threads branch July 29, 2022 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Process runs out of threads when fetching large amounts of metadata
2 participants