-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9830: Hunspell: store word length for faster dictionary lookup/enumeration #3
Conversation
Solr references probably should be removed from the checklist above |
@@ -65,6 +65,7 @@ | |||
TrigramAutomaton automaton = new TrigramAutomaton(word); | |||
|
|||
dictionary.words.processAllWords( | |||
Math.max(1, word.length() - 4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now pass minLength in addition to maxLength
@@ -63,17 +75,14 @@ | |||
* <li>VINT: a delta pointer to the entry for the same word without the last character. | |||
* Precisely, it's the difference of this entry's start and the prefix's entry start. 0 for | |||
* single-character entries | |||
* <li>Optional, for non-leaf entries only: | |||
* <li>(Optional, for hash-colliding entries only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved collision info before the forms
data to avoid skipping over vInts on mismatches
Add a utility task to list all existing pacage names
Add HNSW building to search tests
Description
Word length could be checked before more materializing the whole word
Solution
Use the spare bits in the hash table
int
and collisionbyte
to encode word length, if it's short enough (almost always).Tests
No new tests, 10-15% speedup in
TestPerformance.de_suggest
.Checklist
Please review the following and check all that apply:
master
branch../gradlew check
.