-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline is working with SRA run IDs, but failing with corresponding Biosample IDs #129
Comments
Hi @amizeranschi, Thank you for the report. I looked into this and it comes down to the following request to NCBI failing: @drpatelh the corresponding request to ENA succeeds Should we just always resolve SRA identifiers using the ENA API? Or was there a reason not to? |
Did a bit more testing. I do get a hit using the NCBI esearch: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=sra&term=SAMN11619542 One could parse that output for the identifier and then use that in efetch: which is successful. However, that means extra requests while ENA provides direct access to this information. |
Hi @Midnighter and @drpatelh Has anyone had a chance to look into this issue? |
Well, I've been working on a tool that can resolve these correctly. There's no clear roadmap for when that tool will end up integrated into the pipeline, though. My best suggestion at the moment is: Use the ENA Portal yourself to transform the BioSamples into run identifiers. Then use the run identifiers as input to fetchngs. |
This has been fixed in #149
We could do but the problem has always been getting the identifier resolution to work as expected when you have multiple run ids associated with any given "meta" id. In this case, I found an id Will close for now but if you observe the same behaviour for other ids then please feel free to re-open or create a new issue. |
Hi @drpatelh Thanks for working on this issue. I have tested the current dev code, using the command from the original post above, but it's still not working for me. Here is the output:
|
Wait, nevermind. I see now that the fix hasn't been merged into the dev branch yet. |
Merged into You will have to pull the latest code before re-running
|
I tested the fix now and it works as expected. Thanks again for addressing this issue! |
Awesome! Thanks for testing so quickly 🚀 |
Description of the bug
The pipeline works when supplying SRA run IDs (SRR***), but fails with corresponding Biosample IDs (SAMN***).
For instance, the following works:
and the following fails:
Command used and terminal output
Relevant files
nextflow.log
System information
N E X T F L O W ~ version 22.10.2
Local; Docker; Ubuntu 20.04
nf-core/fetchngs v1.8
The text was updated successfully, but these errors were encountered: