-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the performance of date histogram aggregation (without any sub-aggregation) #11083
Conversation
Signed-off-by: Ankit Jain <akjain@amazon.com>
Signed-off-by: Ankit Jain <akjain@amazon.com>
Signed-off-by: Ankit Jain <akjain@amazon.com>
Signed-off-by: Ankit Jain <akjain@amazon.com>
❌ Gradle check result for b535934: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #11083 +/- ##
============================================
+ Coverage 71.33% 71.47% +0.14%
- Complexity 58982 59136 +154
============================================
Files 4890 4891 +1
Lines 277468 277629 +161
Branches 40313 40347 +34
============================================
+ Hits 197919 198442 +523
+ Misses 63127 62723 -404
- Partials 16422 16464 +42 ☔ View full report in Codecov by Sentry. |
I agree -- while I appreciate the value of testing the profiler, I don't think it was ever reasonable to assume backward compatibility in profiler output meaning we must provide the exact same profile shape across different versions. |
Signed-off-by: Ankit Jain <akjain@amazon.com>
@@ -98,7 +98,7 @@ long roundFloor(long utcMillis) { | |||
} | |||
|
|||
@Override | |||
long extraLocalOffsetLookup() { | |||
public long extraLocalOffsetLookup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jainankitk What is the reason to change the visibility level? this is internal API and should stay as such
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offset value is needed for creating the correct range buckets in FilterRewriteHelper::createFilterForAggregations
. Some of the calendar intervals like month/quarter/year can have varying number of days, which is stored in extraLocalOffsetLookup. Do you see any harm in making just getter public? If yes, do you have a workaround in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jainankitk please take a loot at #11392
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good one, thanks!
This is targeted for v2.12, hence adding |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-11083-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0ddbd96291d4ff05499ec53c5a04a5dda32d36ad
# Push it to GitHub
git push --set-upstream origin backport/backport-11083-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…sub-aggregation) (opensearch-project#11083) * Adding filter based optimization logic to date histogram aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Reading the field name for aggregation correctly Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding the limit on number of buckets for filter aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Applying the optimizations for match all query as well Signed-off-by: Ankit Jain <akjain@amazon.com> * Handling the unwrapped match all query Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for recursively unwrapping the query Signed-off-by: Ankit Jain <akjain@amazon.com> * Restructuring the code for making it more reusable and unit testable Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding javadocs for fixing build failure Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing minor bugs in refactoring Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for optimizing auto date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing bugs and passing unit tests for date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Temporarily reverting auto date histogram changes Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing spotless check bugs Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding back auto date histogram and passing all unit tests Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration tests for reduced collector work Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration test regression Signed-off-by: Ankit Jain <akjain@amazon.com> * Addressing code review comments Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing hardbound, missing and script test cases Signed-off-by: Ankit Jain <akjain@amazon.com> * Removing collect_count validation to prevent backward compatibility tests from failing Signed-off-by: Ankit Jain <akjain@amazon.com> * Finally fixing hardbounds test case Signed-off-by: Ankit Jain <akjain@amazon.com> * Refactoring code for reusability Signed-off-by: Ankit Jain <akjain@amazon.com> --------- Signed-off-by: Ankit Jain <akjain@amazon.com> (cherry picked from commit 0ddbd96)
…n (without any … (#11390) * Improving the performance of date histogram aggregation (without any sub-aggregation) (#11083) * Adding filter based optimization logic to date histogram aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Reading the field name for aggregation correctly Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding the limit on number of buckets for filter aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Applying the optimizations for match all query as well Signed-off-by: Ankit Jain <akjain@amazon.com> * Handling the unwrapped match all query Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for recursively unwrapping the query Signed-off-by: Ankit Jain <akjain@amazon.com> * Restructuring the code for making it more reusable and unit testable Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding javadocs for fixing build failure Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing minor bugs in refactoring Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for optimizing auto date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing bugs and passing unit tests for date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Temporarily reverting auto date histogram changes Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing spotless check bugs Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding back auto date histogram and passing all unit tests Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration tests for reduced collector work Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration test regression Signed-off-by: Ankit Jain <akjain@amazon.com> * Addressing code review comments Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing hardbound, missing and script test cases Signed-off-by: Ankit Jain <akjain@amazon.com> * Removing collect_count validation to prevent backward compatibility tests from failing Signed-off-by: Ankit Jain <akjain@amazon.com> * Finally fixing hardbounds test case Signed-off-by: Ankit Jain <akjain@amazon.com> * Refactoring code for reusability Signed-off-by: Ankit Jain <akjain@amazon.com> --------- Signed-off-by: Ankit Jain <akjain@amazon.com> (cherry picked from commit 0ddbd96) * Revert Rounding API visibility changes Signed-off-by: Ankit Jain <akjain@amazon.com> * Reverting missed rounding API visibility change Co-authored-by: Andriy Redko <drreta@gmail.com> Signed-off-by: Ankit Jain <akjain@amazon.com> --------- Signed-off-by: Ankit Jain <akjain@amazon.com> Co-authored-by: Andriy Redko <drreta@gmail.com>
…sub-aggregation) (opensearch-project#11083) * Adding filter based optimization logic to date histogram aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Reading the field name for aggregation correctly Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding the limit on number of buckets for filter aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Applying the optimizations for match all query as well Signed-off-by: Ankit Jain <akjain@amazon.com> * Handling the unwrapped match all query Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for recursively unwrapping the query Signed-off-by: Ankit Jain <akjain@amazon.com> * Restructuring the code for making it more reusable and unit testable Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding javadocs for fixing build failure Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing minor bugs in refactoring Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for optimizing auto date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing bugs and passing unit tests for date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Temporarily reverting auto date histogram changes Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing spotless check bugs Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding back auto date histogram and passing all unit tests Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration tests for reduced collector work Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration test regression Signed-off-by: Ankit Jain <akjain@amazon.com> * Addressing code review comments Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing hardbound, missing and script test cases Signed-off-by: Ankit Jain <akjain@amazon.com> * Removing collect_count validation to prevent backward compatibility tests from failing Signed-off-by: Ankit Jain <akjain@amazon.com> * Finally fixing hardbounds test case Signed-off-by: Ankit Jain <akjain@amazon.com> * Refactoring code for reusability Signed-off-by: Ankit Jain <akjain@amazon.com> --------- Signed-off-by: Ankit Jain <akjain@amazon.com>
…sub-aggregation) (opensearch-project#11083) * Adding filter based optimization logic to date histogram aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Reading the field name for aggregation correctly Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding the limit on number of buckets for filter aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Applying the optimizations for match all query as well Signed-off-by: Ankit Jain <akjain@amazon.com> * Handling the unwrapped match all query Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for recursively unwrapping the query Signed-off-by: Ankit Jain <akjain@amazon.com> * Restructuring the code for making it more reusable and unit testable Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding javadocs for fixing build failure Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing minor bugs in refactoring Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for optimizing auto date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing bugs and passing unit tests for date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Temporarily reverting auto date histogram changes Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing spotless check bugs Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding back auto date histogram and passing all unit tests Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration tests for reduced collector work Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration test regression Signed-off-by: Ankit Jain <akjain@amazon.com> * Addressing code review comments Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing hardbound, missing and script test cases Signed-off-by: Ankit Jain <akjain@amazon.com> * Removing collect_count validation to prevent backward compatibility tests from failing Signed-off-by: Ankit Jain <akjain@amazon.com> * Finally fixing hardbounds test case Signed-off-by: Ankit Jain <akjain@amazon.com> * Refactoring code for reusability Signed-off-by: Ankit Jain <akjain@amazon.com> --------- Signed-off-by: Ankit Jain <akjain@amazon.com>
…sub-aggregation) (opensearch-project#11083) * Adding filter based optimization logic to date histogram aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Reading the field name for aggregation correctly Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding the limit on number of buckets for filter aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Applying the optimizations for match all query as well Signed-off-by: Ankit Jain <akjain@amazon.com> * Handling the unwrapped match all query Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for recursively unwrapping the query Signed-off-by: Ankit Jain <akjain@amazon.com> * Restructuring the code for making it more reusable and unit testable Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding javadocs for fixing build failure Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing minor bugs in refactoring Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for optimizing auto date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing bugs and passing unit tests for date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Temporarily reverting auto date histogram changes Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing spotless check bugs Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding back auto date histogram and passing all unit tests Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration tests for reduced collector work Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration test regression Signed-off-by: Ankit Jain <akjain@amazon.com> * Addressing code review comments Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing hardbound, missing and script test cases Signed-off-by: Ankit Jain <akjain@amazon.com> * Removing collect_count validation to prevent backward compatibility tests from failing Signed-off-by: Ankit Jain <akjain@amazon.com> * Finally fixing hardbounds test case Signed-off-by: Ankit Jain <akjain@amazon.com> * Refactoring code for reusability Signed-off-by: Ankit Jain <akjain@amazon.com> --------- Signed-off-by: Ankit Jain <akjain@amazon.com>
…sub-aggregation) (opensearch-project#11083) * Adding filter based optimization logic to date histogram aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Reading the field name for aggregation correctly Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding the limit on number of buckets for filter aggregation Signed-off-by: Ankit Jain <akjain@amazon.com> * Applying the optimizations for match all query as well Signed-off-by: Ankit Jain <akjain@amazon.com> * Handling the unwrapped match all query Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for recursively unwrapping the query Signed-off-by: Ankit Jain <akjain@amazon.com> * Restructuring the code for making it more reusable and unit testable Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding javadocs for fixing build failure Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing minor bugs in refactoring Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding logic for optimizing auto date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing bugs and passing unit tests for date histogram Signed-off-by: Ankit Jain <akjain@amazon.com> * Temporarily reverting auto date histogram changes Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing spotless check bugs Signed-off-by: Ankit Jain <akjain@amazon.com> * Adding back auto date histogram and passing all unit tests Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration tests for reduced collector work Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing the integration test regression Signed-off-by: Ankit Jain <akjain@amazon.com> * Addressing code review comments Signed-off-by: Ankit Jain <akjain@amazon.com> * Fixing hardbound, missing and script test cases Signed-off-by: Ankit Jain <akjain@amazon.com> * Removing collect_count validation to prevent backward compatibility tests from failing Signed-off-by: Ankit Jain <akjain@amazon.com> * Finally fixing hardbounds test case Signed-off-by: Ankit Jain <akjain@amazon.com> * Refactoring code for reusability Signed-off-by: Ankit Jain <akjain@amazon.com> --------- Signed-off-by: Ankit Jain <akjain@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
This change preemptively creates point range query filter on each date histogram bucket for quick collection instead of iterating over all the documents. Currently, the optimization is limited to search requests getting rewritten into matchall query or filtering the documents using point range query on same field as date histogram aggregation. This PR will be followed up with below changes:
Expand the optimization to match all query by using min/maxPackedValue at segment/leaf level for creating bucketApply the same optimization to AutoDateHistogramRelated Issues
Resolves #9310
Check List
[ ] Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.