-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR) #69
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments but this looks great in general.
lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene90/ForUtil.java
Outdated
Show resolved
Hide resolved
…l.java Co-authored-by: Adrien Grand <jpountz@gmail.com>
(Sorry replying here as Github prevents me from replying on the existing thread) +1 Let go with whichever of |
Perfect, thanks! I'm changing this back to
|
@jpountz I think I've addressed all of your feedback at this point. No rush if you've got other work occupying your time right now of course, just wanted to check in and make sure you're not waiting on me to make some additional changes. Thanks again for all your feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I'll merge soon.
Thanks @gsmiller ! I'm not really competent to review this stuff, just here for moral support. But the numbers look good to me. |
* Consolidate developer docs into top level /dev-docs, and provide a single pointer to other places that host developer oriented docs. * Some small tweaks to the cloud testing script.
We are still keeping PFOR for positions only. This is a partial revert of apache#69 which brings back ForDeltaUtil.
We are still keeping PFOR for positions only. This is a partial revert of apache#69 which brings back ForDeltaUtil.
We are still keeping PFOR for positions only. This is a partial revert of apache#69 which brings back ForDeltaUtil.
* Change Postings back to using FOR in Lucene99PostingsFormat We are still keeping PFOR for positions only. This is a partial revert of #69 which brings back ForDeltaUtil. * fix merge commit * Add forgotten forDeltaUtil calls to reader * Addressing comments: adding Lucene90RWPostingsFormat + more Also: * Change to Changes.txt * Removal of dead code which was only used in unit tests * Removal of test code from PForUtil * Changes.txt edit in right place now * Apply suggestions from code review: `90 -> 99 refactoring` Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com> * Remove decodeTo32 from ForUtil and regenerate --------- Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>
* Change Postings back to using FOR in Lucene99PostingsFormat We are still keeping PFOR for positions only. This is a partial revert of #69 which brings back ForDeltaUtil. * fix merge commit * Add forgotten forDeltaUtil calls to reader * Addressing comments: adding Lucene90RWPostingsFormat + more Also: * Change to Changes.txt * Removal of dead code which was only used in unit tests * Removal of test code from PForUtil * Changes.txt edit in right place now * Apply suggestions from code review: `90 -> 99 refactoring` Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com> * Remove decodeTo32 from ForUtil and regenerate --------- Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>
Description
Switch over to PFOR encoding for doc IDs (instead of FOR) to achieve better index compression.
Solution
Details are in the Jira issue, but I explored the index size vs. decompression speed tradeoffs using luceneutil benchmarks and found ~3.3% index size reduction with no significant OPS impact.
Tests
In addition to benchmarks, I ported over the PForDeltaUtil unit tests to ensure unit test coverage.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.