-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ID ranges instead of relying on OFFSET when doing pagination. #1955
Implement ID ranges instead of relying on OFFSET when doing pagination. #1955
Conversation
OFFSET gets progressively slower on larger datasets, using ID ranges provides consistent and fast results.
$range = [ | ||
'start' => "{$GLOBALS['wpdb']->posts}.ID <= {$start_range_post_id}", | ||
'end' => "{$GLOBALS['wpdb']->posts}.ID >= {$requested_end_id}", | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @brandon-m-skinner ! Any reason why the upper limit is the start and the lower number is the end?
I wonder if it wouldn't be easier for users to run
wp elasticpress index --start-object-id=1100 --end-object-id=1200
rather than
wp elasticpress index --start-object-id=1200 --end-object-id=1100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the indexing works from latest to oldest, this matches that behavior since it's starting with the highest ID and then goes to the lowest. I do agree that it's a bit confusing for users that may not be aware of this.
If it's a concern, I think perhaps could support both ways? Doesn't matter which way it's input, behind the scenes it could be organized as needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felipeelia @WPprodigy let's keep the ordering (I think it makes sense from an indexing perspective to start with the most recent content) but document this heavily because it's somewhat arbitrary, not intuitive.
@@ -84,25 +86,104 @@ public function query_db( $args ) { | |||
* @param {array} $args Database arguments | |||
* @return {array} New arguments | |||
*/ | |||
$args = apply_filters( 'ep_post_query_db_args', wp_parse_args( $args, $defaults ) ); | |||
$args = apply_filters( 'ep_index_posts_args', apply_filters( 'ep_post_query_db_args', wp_parse_args( $args, $defaults ) ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split this into two lines, so we can add the necessary documentation for both filters @brandon-m-skinner ?
Description of the Change
We noticed that indexing speed gets much slower than expected as the data set grows. We found the issue to be a combination of factors, but this PR is focused on an issue present in both versions of ElasticPress.
Essentially the change from using offsets to using ID ranges provides a very noticeable speed boost as the data set grows. We've seen significant gains when object IDs are extremely high.
Props @WPprodigy
cc: @rinatkhaziev @nickdaugherty @pschoffer @netsuso @parkcityj
Alternate Designs
We didn't have any. Just needed some speed enhancements in short order.
Benefits
It makes indexing faster at high object IDs.
Possible Drawbacks
Adds additional CLI params.
Verification Process
We ran the indexing commands on large multisites in excess of 10 million posts and noticed drastic increases in indexing speed compared to previous indexing operations.
Queries run without the PR(fetch 500 rows):
Queries run with the PR are around 0.28 secs consistently regardless of the ID(fetch 500 rows)
Checklist:
Applicable Issues
Changelog Entry
Added performance boost to querying posts from database. Props @WPprodigy