-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourmash in MGnify #1577
Comments
Hi @gustavo-salazar! I don't see any issues with this picture, that's pretty much what I did in greyhound.sourmash.bio (code), but using a Rust backend for step 3. As is, greyhound will not scale easily to MGnify-levels, but some things to consider for a more scalable solution: Part 2: creating the query signature
Part 3:
|
Hey @luizirber Part 2Indeed a web worker makes a lot of sense here. I'll consider it once I get to that part. I have some experience with web components, so I'll definitely will give a try to create this as one. And TBH, I don't think I'll be using any advanced feature of the sourmash NPM package, and probably will be following similar logic to what you have in your blogpost. But of course I will let you know if I have any feedback. Part 3This is where I just started prototyping. To be honest, my plan was to either copy most of the code you have in the gather command, mostly to format the output of the result for our purposes, or to mock the I don't have a repo yet, as for now I'm just playing around to define our architecture for it, but once I have something I will share the links here. Thanks again for your help. |
Hey y'all, After some tweaking and tuning with webpack 5, I manage to generate my first set of signatures in the client. I'm basically copying what @luizirber did for its blog here, and so far I'm only processing uncompressed FastA files. Here is the repo if anyone is interested: https://github.com/EBI-Metagenomics/mgnify-sourmash-component |
This is wonderful 🤩 Something I did in greyhound and I think is doable in #1625 is moving the FASTA/Q parsing (including gz compression) into Rust (and use The function body in Rust would be something like this, but taking the (the added benefit is that the parsing will be much faster...) |
Oh that's excellent news!🎉 The problem with those dependencies is that they all have their assumptions on how the stream API is in the Web, but that standard is still under development, and some of the updates are not backwards compatible. This change will save me quite a few headaches. 😅 I'll keep the fasta parser until you can do the release of #1625 but I won't be trying anymore to make the |
Hey @luizirber @ctb I just completed a first prototype of the system, which you can play with it HERE If you are interested in the code, it is split in 4 components:
|
This is awesome! |
Hey @luizirber @ctb We have iterated a bit on the feature using sourmash to search our catalogues, and we are close to release it live. Among other changes, it now includes the sourmash logo, I just wanted to make sure you are OK with that. BTW @luizirber have you made any progress on including the |
@gustavo-salazar #3047 implement sequence parsing in wasm, EBI-Metagenomics/mgnify-sourmash-component#4 adds it to the MGnify component |
Address #1577 (comment) This PR implements `Read` for `File` in browsers, which allows using `niffler` + `needletail` to parse FASTA/Q, `.gz`compressed or not, in browsers. I also added error handling, so the browser can print nicer error messages instead of something cryptic to `console.log`.
Hello there,
We want to use sourmash to power the search in the MGnify genome catalog.
Right now we are only on a prototyping stage, but I would like to share here our high level plan, and ask you to let us know if you see any red flags in our approach.
sketch
andindex
them.gather
of the query against our catalog using the python sourmash and will return the results.Please let me know if you see any issue with this high-level picture.
Thanks, for your help,
Gustavo.
The text was updated successfully, but these errors were encountered: