Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the performance of date histogram aggregation (without any sub-aggregation) #11083

Merged
merged 25 commits into from
Nov 28, 2023

Conversation

jainankitk
Copy link
Collaborator

@jainankitk jainankitk commented Nov 3, 2023

Description

This change preemptively creates point range query filter on each date histogram bucket for quick collection instead of iterating over all the documents. Currently, the optimization is limited to search requests getting rewritten into matchall query or filtering the documents using point range query on same field as date histogram aggregation. This PR will be followed up with below changes:

  • Expand the optimization to match all query by using min/maxPackedValue at segment/leaf level for creating bucket
  • Apply the same optimization to AutoDateHistogram
  • Apply the same optimization to Composite aggregation with source as DateHistogram
  • Tune the number of buckets for which query filter optimization can be applied
  • Apply the optimization to DocValuesFieldExistsQuery over the field (now deprecated in favor of FieldExistsQuery)
  • Apply the optimization if the individual segment is match all query even though the whole shard is not

Related Issues

Resolves #9310

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [ ] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ankit Jain <akjain@amazon.com>
Signed-off-by: Ankit Jain <akjain@amazon.com>
Signed-off-by: Ankit Jain <akjain@amazon.com>
Signed-off-by: Ankit Jain <akjain@amazon.com>
Copy link
Contributor

❌ Gradle check result for b535934: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for be20dde: SUCCESS

Copy link

codecov bot commented Nov 27, 2023

Codecov Report

Attention: 26 lines in your changes are missing coverage. Please review.

Comparison is base (8c6eb7d) 71.33% compared to head (e225781) 71.47%.
Report is 2 commits behind head on main.

Files Patch % Lines
...egations/bucket/histogram/FilterRewriteHelper.java 79.38% 10 Missing and 10 partials ⚠️
...ions/bucket/histogram/DateHistogramAggregator.java 82.60% 3 Missing and 1 partial ⚠️
.../bucket/histogram/AutoDateHistogramAggregator.java 94.44% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11083      +/-   ##
============================================
+ Coverage     71.33%   71.47%   +0.14%     
- Complexity    58982    59136     +154     
============================================
  Files          4890     4891       +1     
  Lines        277468   277629     +161     
  Branches      40313    40347      +34     
============================================
+ Hits         197919   198442     +523     
+ Misses        63127    62723     -404     
- Partials      16422    16464      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msfroh
Copy link
Collaborator

msfroh commented Nov 27, 2023

The test org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/10_histogram/date_histogram profiler} is failing due to the optimization not being present in the prior versions of OpenSearch. I am wondering if the collect_count validation should be removed from the test or the test itself should be skipped for prior versions. I am in favor of removing the collect_count validation as it does not really matter as long as the result is correct. @msfroh - Thoughts?

I agree -- while I appreciate the value of testing the profiler, I don't think it was ever reasonable to assume backward compatibility in profiler output meaning we must provide the exact same profile shape across different versions.

Signed-off-by: Ankit Jain <akjain@amazon.com>
Copy link
Contributor

✅ Gradle check result for e225781: SUCCESS

@msfroh msfroh merged commit 0ddbd96 into opensearch-project:main Nov 28, 2023
29 checks passed
@@ -98,7 +98,7 @@ long roundFloor(long utcMillis) {
}

@Override
long extraLocalOffsetLookup() {
public long extraLocalOffsetLookup() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jainankitk What is the reason to change the visibility level? this is internal API and should stay as such

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offset value is needed for creating the correct range buckets in FilterRewriteHelper::createFilterForAggregations. Some of the calendar intervals like month/quarter/year can have varying number of days, which is stored in extraLocalOffsetLookup. Do you see any harm in making just getter public? If yes, do you have a workaround in mind?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jainankitk please take a loot at #11392

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good one, thanks!

@jainankitk jainankitk added the backport 2.x Backport to 2.x branch label Nov 29, 2023
@jainankitk
Copy link
Collaborator Author

This is targeted for v2.12, hence adding backport 2.x label

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-11083-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0ddbd96291d4ff05499ec53c5a04a5dda32d36ad
# Push it to GitHub
git push --set-upstream origin backport/backport-11083-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-11083-to-2.x.

jainankitk added a commit to jainankitk/OpenSearch that referenced this pull request Nov 29, 2023
…sub-aggregation) (opensearch-project#11083)

* Adding filter based optimization logic to date histogram aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Reading the field name for aggregation correctly

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding the limit on number of buckets for filter aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Applying the optimizations for match all query as well

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Handling the unwrapped match all query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for recursively unwrapping the query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Restructuring the code for making it more reusable and unit testable

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding javadocs for fixing build failure

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing minor bugs in refactoring

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for optimizing auto date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing bugs and passing unit tests for date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Temporarily reverting auto date histogram changes

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing spotless check bugs

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding back auto date histogram and passing all unit tests

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration tests for reduced collector work

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration test regression

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Addressing code review comments

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing hardbound, missing and script test cases

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Removing collect_count validation to prevent backward compatibility tests from failing

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Finally fixing hardbounds test case

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Refactoring code for reusability

Signed-off-by: Ankit Jain <akjain@amazon.com>

---------

Signed-off-by: Ankit Jain <akjain@amazon.com>
(cherry picked from commit 0ddbd96)
@jainankitk jainankitk deleted the date-histo branch November 29, 2023 14:51
msfroh pushed a commit that referenced this pull request Dec 1, 2023
…n (without any … (#11390)

* Improving the performance of date histogram aggregation (without any sub-aggregation) (#11083)

* Adding filter based optimization logic to date histogram aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Reading the field name for aggregation correctly

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding the limit on number of buckets for filter aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Applying the optimizations for match all query as well

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Handling the unwrapped match all query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for recursively unwrapping the query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Restructuring the code for making it more reusable and unit testable

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding javadocs for fixing build failure

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing minor bugs in refactoring

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for optimizing auto date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing bugs and passing unit tests for date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Temporarily reverting auto date histogram changes

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing spotless check bugs

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding back auto date histogram and passing all unit tests

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration tests for reduced collector work

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration test regression

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Addressing code review comments

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing hardbound, missing and script test cases

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Removing collect_count validation to prevent backward compatibility tests from failing

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Finally fixing hardbounds test case

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Refactoring code for reusability

Signed-off-by: Ankit Jain <akjain@amazon.com>

---------

Signed-off-by: Ankit Jain <akjain@amazon.com>
(cherry picked from commit 0ddbd96)

* Revert Rounding API visibility changes

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Reverting missed rounding API visibility change

Co-authored-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Ankit Jain <akjain@amazon.com>

---------

Signed-off-by: Ankit Jain <akjain@amazon.com>
Co-authored-by: Andriy Redko <drreta@gmail.com>
fahadshamiinsta pushed a commit to fahadshamiinsta/OpenSearch270 that referenced this pull request Dec 4, 2023
…sub-aggregation) (opensearch-project#11083)

* Adding filter based optimization logic to date histogram aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Reading the field name for aggregation correctly

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding the limit on number of buckets for filter aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Applying the optimizations for match all query as well

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Handling the unwrapped match all query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for recursively unwrapping the query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Restructuring the code for making it more reusable and unit testable

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding javadocs for fixing build failure

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing minor bugs in refactoring

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for optimizing auto date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing bugs and passing unit tests for date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Temporarily reverting auto date histogram changes

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing spotless check bugs

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding back auto date histogram and passing all unit tests

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration tests for reduced collector work

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration test regression

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Addressing code review comments

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing hardbound, missing and script test cases

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Removing collect_count validation to prevent backward compatibility tests from failing

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Finally fixing hardbounds test case

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Refactoring code for reusability

Signed-off-by: Ankit Jain <akjain@amazon.com>

---------

Signed-off-by: Ankit Jain <akjain@amazon.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Dec 11, 2023
…sub-aggregation) (opensearch-project#11083)

* Adding filter based optimization logic to date histogram aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Reading the field name for aggregation correctly

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding the limit on number of buckets for filter aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Applying the optimizations for match all query as well

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Handling the unwrapped match all query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for recursively unwrapping the query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Restructuring the code for making it more reusable and unit testable

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding javadocs for fixing build failure

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing minor bugs in refactoring

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for optimizing auto date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing bugs and passing unit tests for date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Temporarily reverting auto date histogram changes

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing spotless check bugs

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding back auto date histogram and passing all unit tests

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration tests for reduced collector work

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration test regression

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Addressing code review comments

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing hardbound, missing and script test cases

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Removing collect_count validation to prevent backward compatibility tests from failing

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Finally fixing hardbounds test case

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Refactoring code for reusability

Signed-off-by: Ankit Jain <akjain@amazon.com>

---------

Signed-off-by: Ankit Jain <akjain@amazon.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…sub-aggregation) (opensearch-project#11083)

* Adding filter based optimization logic to date histogram aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Reading the field name for aggregation correctly

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding the limit on number of buckets for filter aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Applying the optimizations for match all query as well

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Handling the unwrapped match all query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for recursively unwrapping the query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Restructuring the code for making it more reusable and unit testable

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding javadocs for fixing build failure

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing minor bugs in refactoring

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for optimizing auto date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing bugs and passing unit tests for date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Temporarily reverting auto date histogram changes

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing spotless check bugs

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding back auto date histogram and passing all unit tests

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration tests for reduced collector work

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration test regression

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Addressing code review comments

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing hardbound, missing and script test cases

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Removing collect_count validation to prevent backward compatibility tests from failing

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Finally fixing hardbounds test case

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Refactoring code for reusability

Signed-off-by: Ankit Jain <akjain@amazon.com>

---------

Signed-off-by: Ankit Jain <akjain@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…sub-aggregation) (opensearch-project#11083)

* Adding filter based optimization logic to date histogram aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Reading the field name for aggregation correctly

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding the limit on number of buckets for filter aggregation

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Applying the optimizations for match all query as well

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Handling the unwrapped match all query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for recursively unwrapping the query

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Restructuring the code for making it more reusable and unit testable

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding javadocs for fixing build failure

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing minor bugs in refactoring

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding logic for optimizing auto date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing bugs and passing unit tests for date histogram

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Temporarily reverting auto date histogram changes

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing spotless check bugs

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Adding back auto date histogram and passing all unit tests

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration tests for reduced collector work

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing the integration test regression

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Addressing code review comments

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Fixing hardbound, missing and script test cases

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Removing collect_count validation to prevent backward compatibility tests from failing

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Finally fixing hardbounds test case

Signed-off-by: Ankit Jain <akjain@amazon.com>

* Refactoring code for reusability

Signed-off-by: Ankit Jain <akjain@amazon.com>

---------

Signed-off-by: Ankit Jain <akjain@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs Search:Aggregations v2.12.0 Issues and PRs related to version 2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Understand/Improve the performance of date histograms
3 participants