-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oxidize parts of LCA_Database? #948
Comments
I was thinking about doing the The most complicated part is loading/saving, because parts of it would be in Rust (revindex) and parts in Python (the lineage stuff). It's not terribly bad, but may need to read the JSON file twice on |
On Sat, Apr 18, 2020 at 12:13:52PM -0700, Luiz Irber wrote:
I was thinking about doing the `revindex` parts in Rust, and the LCA bits in Python. I took a quick look at #946 and as you pointed out it is almost there, since `_signatures` and `_find_signatures` don't do any LCA/taxonomy calculations.
The most complicated part is loading/saving, because parts of it would be in Rust (revindex) and parts in Python (the lineage stuff). It's not terribly bad, but may need to read the JSON file twice on `load`, and save the revindex bits to JSON first and then later adding the lineage data in Python on `save`.
I think the lineage stuff is also quite easy in rust, at least for saving
and loading.
|
Then you have my blessing to go implement it 🪄 |
I went and took a look at what
here each signature int id ( Oxidation of this code in save/load would then "just" be saving/loading these lists of rank/name pairs. |
Is this going to be a similar move to the optimization of the MinHash class in rust? |
you could do it that way, and move all the code over at once; OR you could do it piecemeal (which for me is my preferred way with Python) - slowly transfer code over into Rust by writing functioning code chunks and keeping the tests working, and refactor the Rust code regularly and eventually make it into a new class. e.g. you could transfer over something tells me that Rust would reward the all-at-once approach more, but @luizirber should weigh in here. |
Hopefully you can go more piecemeal than the MinHash transition... That was a lot, and took quite some time. But we also have more structure nowadays to support the oxidation =] I agree with Titus: transferring |
Upon closer inspection, I changed my mind: It's probably easier to move the whole |
(just need to check that we are not accessing any attributes of |
On Wed, Jul 22, 2020 at 12:21:24PM -0700, Luiz Irber wrote:
(just need to check that we are not accessing any attributes of `LCA_Database` in other parts of sourmash. We could expose them as properties for now, but it would be better to have methods to access any internal structure)
we, umm, are. at least in the tests. sorry.
|
If they are only used in the tests, we can move the tests to Rust and avoid exposing internals. |
note that #1936 refactors |
An implicit goal of #946 refactoring and testing the
LCA_Database
class is to support optimization of searches by moving them into a Rust extension module. I think the function_find_signatures(...)
would be the main target (along with_signatures(...)
). Could be good to look at some of the other LCA submodule commands to see if the hashvalue gathering done in classify and summarize could be pulled back to anLCA_Database
method as well.The text was updated successfully, but these errors were encountered: