-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FM index cursor enable node access #2061
Comments
Core Meeting 2020-08-24: Enrico wanted to talk with you on Wednesday, to understand your use case. The initial thought was to make |
We can implement getter functions for what you need. Do you use the What do you use and what kind of access (do you want to modify them) would you need on which members? |
Thanks for your responses and the feedback. I am currently using the fm_index_cursor, though it is possible that I switch to the bidirectional index at some point. I don't need to modify the values, and I think it generally makes sense that the members can't be modified from outside to prevent corrupt state of a cursor. Therefore I would propose to return by value or const ref. |
I added the node access to fm_index_cursors in #2076 |
I'm sorry, but I still don't understand what you need. It seems that you don't need the node itself, but some value that can be computed / retrieved from the node. Can you try out @eseiler Pr and write us exactly what members of that node you need? As we talked about this in the core meeting it seemed that the (suffix?) interval, i.e. lower- and upper-bound, would suffice for your use case. If we can, I would prefer to keep the node an internal implementation detail and give a stronger interface for values that are needed. |
If you just want the position, you only need the node that encapsulates all the information. The fm_index_cursor also stores parent_lb and parent_rb but that is only needed when cycling back, i.e. going up in the tree - in this case you need to know where you were before. |
I can actually tell you without checking out the PR: |
We can also add 3 getters get the members of the node. |
2020-08-31 Resolution: We discussed what a cursor and a node of a cursor would be, and we came to the conclusion that a cursor is already the "user" abstraction for a node and that we should add the bounds and "depth" to the fm-cursor. We should find a different name for depth because it is tied to a tree. |
Could be something like suffix_length / suffix_size!? |
The |
I updated my PR #2076 |
@tloka I think, the usual use case would be to look at both bounds anyways, so returning a pair may be a better interface. |
This is a pure design decision. I am personally not the biggest fan of using pairs in general as it often is not directly obvious from the code what the meaning of first and second is. Though, in this case it would be pretty intuitive that first is left and second is right. However, design decisions should imho be made by the core team, so just go for whatever is the best / most consistent from your perspective. |
Hey,
I have one more request concerning the FM index cursor. While with #2048 you provide cerealisation for FM index cursors (which is great and clearly resolves #2044 ), this unfortunately does not resolve a different issue I have with the current implementation of the cursor.
Need
For my FM index algorithm, I need to compare the locations of FM index cursors. I use this for sorting, removing duplicates, detecting nodes that are "contained" in other nodes, etc.. Therefore, I need access to the cursor members to compare cursors with each other, either manually or by enabling the implementation of operators such as < to use stl functions.
Solution
To be able to do this with SeqAn3, I would need to access at least the node of FM index cursors which is currently not possible (private member). Would it be possible to provide access to the node of FM index cursors? As these are not too big, I guess a getter returning a copy would in principle be fine in case you want to ensure that corruption of a cursor is not possible, though returning a reference or a getter to access each node member individually would probably be more efficient.
Notes
I am aware that
==
and!=
operators for FM index cursors are defined. Still, this is not sufficient for my purpose as I need very special types of comparisons which would probably not be appropriate for general purpose (e.g., in some of my comparisons the depth should not be included in the comparison, which it currently is for the == operator which generally also makes sense). This is why I would like to have access to the members and define my own comparisons instead of asking for operators directly.The text was updated successfully, but these errors were encountered: