Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ID ranges instead of relying on OFFSET when doing pagination. #2171

Merged
merged 12 commits into from
Jun 24, 2021

Conversation

felipeelia
Copy link
Member

This is a "take over" of #1955 just to address some minor things.

Description of the Change

We noticed that indexing speed gets much slower than expected as the data set grows. We found the issue to be a combination of factors, but this PR is focused on an issue present in both versions of ElasticPress.

Essentially the change from using offsets to using ID ranges provides a very noticeable speed boost as the data set grows. We've seen significant gains when object IDs are extremely high.

Props @WPprodigy
cc: @rinatkhaziev @nickdaugherty @pschoffer @netsuso @parkcityj

Alternate Designs

We didn't have any. Just needed some speed enhancements in short order.

Benefits

It makes indexing faster at high object IDs.

Possible Drawbacks

Adds additional CLI params.

Verification Process

We ran the indexing commands on large multisites in excess of 10 million posts and noticed drastic increases in indexing speed compared to previous indexing operations.

Queries run without the PR(fetch 500 rows):

Offset 2500: 0.35 secs
Offset 25000: 1.54 secs
Offset 250000: 23.42 secs
Offset 2500000: 35.66 sec

Queries run with the PR are around 0.28 secs consistently regardless of the ID(fetch 500 rows)

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests passed.

Applicable Issues

Closes #1955

Changelog Entry

Added performance boost to querying posts from database. Props @WPprodigy

Brandon Skinner and others added 3 commits April 15, 2021 15:16
OFFSET gets progressively slower on larger datasets, using ID ranges provides consistent and fast results.
@felipeelia felipeelia added the qa label Apr 15, 2021
@felipeelia felipeelia added this to the 3.6.0 milestone Apr 15, 2021
@felipeelia felipeelia self-assigned this Apr 15, 2021
Copy link
Contributor

@Rahmon Rahmon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some tests and it's working. However, the option --nobulk is not working as expected and looks like it is ignored.

Other than that, I think --start-object-id and --end-object-id need more documentation because it is still confusing.

@felipeelia
Copy link
Member Author

Thanks @Rahmon. Can you share what you have in mind for:

Other than that, I think --start-object-id and --end-object-id need more documentation because it is still confusing.

@Rahmon Rahmon assigned felipeelia and unassigned Rahmon Apr 27, 2021
@felipeelia
Copy link
Member Author

@Rahmon did you fix the problem with the --nobulk parameter? Also, can you please fix the conflict with the Command.php file? Thanks

@felipeelia felipeelia assigned Rahmon and unassigned felipeelia May 6, 2021
@Rahmon Rahmon assigned felipeelia and unassigned Rahmon May 21, 2021
@oscarssanchez
Copy link
Contributor

Looks good to me. I agree we should document it better, as it took a few reads for me to understand what it does.

@felipeelia felipeelia self-assigned this Jun 22, 2021
@Rahmon Rahmon merged commit ebe68ed into develop Jun 24, 2021
@Rahmon Rahmon deleted the feature/pr-1955 branch June 24, 2021 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants