-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can FST read bytes forward? #12355
Comments
I think |
You remember correctly. I refrained from mentioning |
Maybe we could de-scare (de-fang?) it upon resurrection? E.g. don't try to renumber the nodes for compression ... keep the nodes existing numbers (or maybe, reversed order, so forward only properly still holds), while reversing the bytes ... it might make it just non-scary enough to be worthwhile. Not sure. Reading backwards is also bad even for |
I just stumbled this, I agreed that reading backward is not cache-friendly. Is there a reason why we write it in backward in the first place? We are specially reversing the byte order on addNode so that we would read later in backward. |
+1 to find a way to reverse the bytes at compilation time. The reversal of bytes during FST compilation is so hard to think about! It happens because the FST is logically append-only, and sort of grows backwards (from the suffixes, inwards onto prefixes), and the newly written nodes always point backwards to the already written (appended to growing But logically we ought to be able to write all the bytes backwards, then reverse them, but then when resolving absolute or relative node addresses at FST read time, we'd need to re-reverse those addresses. Or, we could try to rewrite the embedded node address references during/after reversal so we don't need to re-reverse on each node read? The pointers will necessarily be different (take different number of So maybe for starters we do the simple "reverse |
Looking at https://github.com/BurntSushi/fst/blob/master/src/raw/node.rs, it seems Tantivy also read bytes in backward. However this Node class only works with byte array. I think the array was memory-mapped from file before that. |
Interestingly we are specifically reverse the byte[] after the write to make it backward. To make it forward we can simply not do the reverse. |
Description
Forked from https://lists.apache.org/thread/1fskhmz84pp60o41txsxj2193vt9txod: the fact that FST reads bytes backwards doesn't play well with BufferedIndexInput, which triggers a refill on every byte read. Could we reverse these sequences of bytes at index time so that they would be read in forward order at search time?
The text was updated successfully, but these errors were encountered: