-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ID ranges instead of relying on OFFSET when doing pagination. #2171
Conversation
OFFSET gets progressively slower on larger datasets, using ID ranges provides consistent and fast results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some tests and it's working. However, the option --nobulk
is not working as expected and looks like it is ignored.
Other than that, I think --start-object-id
and --end-object-id
need more documentation because it is still confusing.
Thanks @Rahmon. Can you share what you have in mind for:
|
@Rahmon did you fix the problem with the --nobulk parameter? Also, can you please fix the conflict with the Command.php file? Thanks |
Looks good to me. I agree we should document it better, as it took a few reads for me to understand what it does. |
This is a "take over" of #1955 just to address some minor things.
Description of the Change
We noticed that indexing speed gets much slower than expected as the data set grows. We found the issue to be a combination of factors, but this PR is focused on an issue present in both versions of ElasticPress.
Essentially the change from using offsets to using ID ranges provides a very noticeable speed boost as the data set grows. We've seen significant gains when object IDs are extremely high.
Props @WPprodigy
cc: @rinatkhaziev @nickdaugherty @pschoffer @netsuso @parkcityj
Alternate Designs
We didn't have any. Just needed some speed enhancements in short order.
Benefits
It makes indexing faster at high object IDs.
Possible Drawbacks
Adds additional CLI params.
Verification Process
We ran the indexing commands on large multisites in excess of 10 million posts and noticed drastic increases in indexing speed compared to previous indexing operations.
Queries run without the PR(fetch 500 rows):
Queries run with the PR are around 0.28 secs consistently regardless of the ID(fetch 500 rows)
Checklist:
Applicable Issues
Closes #1955
Changelog Entry
Added performance boost to querying posts from database. Props @WPprodigy