Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ID ranges instead of relying on OFFSET when doing pagination. #1955

Closed
wants to merge 1 commit into from
Closed

Implement ID ranges instead of relying on OFFSET when doing pagination. #1955

wants to merge 1 commit into from

Conversation

brandon-m-skinner
Copy link
Contributor

Description of the Change

We noticed that indexing speed gets much slower than expected as the data set grows. We found the issue to be a combination of factors, but this PR is focused on an issue present in both versions of ElasticPress.

Essentially the change from using offsets to using ID ranges provides a very noticeable speed boost as the data set grows. We've seen significant gains when object IDs are extremely high.

Props @WPprodigy
cc: @rinatkhaziev @nickdaugherty @pschoffer @netsuso @parkcityj

Alternate Designs

We didn't have any. Just needed some speed enhancements in short order.

Benefits

It makes indexing faster at high object IDs.

Possible Drawbacks

Adds additional CLI params.

Verification Process

We ran the indexing commands on large multisites in excess of 10 million posts and noticed drastic increases in indexing speed compared to previous indexing operations.

Queries run without the PR(fetch 500 rows):

Offset 2500: 0.35 secs
Offset 25000: 1.54 secs
Offset 250000: 23.42 secs
Offset 2500000: 35.66 sec

Queries run with the PR are around 0.28 secs consistently regardless of the ID(fetch 500 rows)

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests passed.

Applicable Issues

Changelog Entry

Added performance boost to querying posts from database. Props @WPprodigy

OFFSET gets progressively slower on larger datasets, using ID ranges provides consistent and fast results.
@brandwaffle brandwaffle added this to the 3.6.0 milestone Feb 5, 2021
@mckdemps mckdemps requested review from felipeelia and Rahmon February 5, 2021 18:39
@mckdemps mckdemps added the qa label Feb 5, 2021
Comment on lines +147 to +150
$range = [
'start' => "{$GLOBALS['wpdb']->posts}.ID <= {$start_range_post_id}",
'end' => "{$GLOBALS['wpdb']->posts}.ID >= {$requested_end_id}",
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brandon-m-skinner ! Any reason why the upper limit is the start and the lower number is the end?

I wonder if it wouldn't be easier for users to run
wp elasticpress index --start-object-id=1100 --end-object-id=1200
rather than
wp elasticpress index --start-object-id=1200 --end-object-id=1100

Copy link

@WPprodigy WPprodigy Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the indexing works from latest to oldest, this matches that behavior since it's starting with the highest ID and then goes to the lowest. I do agree that it's a bit confusing for users that may not be aware of this.

If it's a concern, I think perhaps could support both ways? Doesn't matter which way it's input, behind the scenes it could be organized as needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felipeelia @WPprodigy let's keep the ordering (I think it makes sense from an indexing perspective to start with the most recent content) but document this heavily because it's somewhat arbitrary, not intuitive.

@@ -84,25 +86,104 @@ public function query_db( $args ) {
* @param {array} $args Database arguments
* @return {array} New arguments
*/
$args = apply_filters( 'ep_post_query_db_args', wp_parse_args( $args, $defaults ) );
$args = apply_filters( 'ep_index_posts_args', apply_filters( 'ep_post_query_db_args', wp_parse_args( $args, $defaults ) ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this into two lines, so we can add the necessary documentation for both filters @brandon-m-skinner ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants