-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix auto date histogram rounding assertion bug #17023
base: main
Are you sure you want to change the base?
Fix auto date histogram rounding assertion bug #17023
Conversation
❕ Gradle check result for 0ecdf31: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #17023 +/- ##
============================================
- Coverage 72.20% 72.16% -0.05%
+ Complexity 65289 65211 -78
============================================
Files 5299 5299
Lines 303536 303538 +2
Branches 43941 43941
============================================
- Hits 219180 219055 -125
- Misses 66441 66518 +77
- Partials 17915 17965 +50 ☔ View full report in Codecov by Sentry. |
@bowenlan-amzn can you take a look when you have a chance? Thank you! |
I remember the optimization will not be applied if the aggregation defined a timezone, so this bug is kind of a surprise. Lines 67 to 68 in ef87b39
The block of timezone is inside getInterval OpenSearch/server/src/main/java/org/opensearch/common/Rounding.java Lines 1385 to 1388 in 1e49aa8
The timezone is part of the Rounding object, so we need to have Rounding first then check the timezone. However, in auto datehistogram, to get the rounding, we are updating prepared rounding also which leading to this bug. I recommend we add a simple timezone check right before Lines 67 to 68 in ef87b39
It purely takes in a Rounding (for autodatehistogram, this can just be the first Rounding in RouningInfos, since every Rounding would have the same timezone information) and do the check OpenSearch/server/src/main/java/org/opensearch/common/Rounding.java Lines 1385 to 1388 in 1e49aa8
If the check doesn't pass, we don't even bother to go inside getRounding method. The check you added here (not shrink the rounding) is still meaningful, agree it should be ever increasing, but not shrink depending on the next segment processed. |
...in/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java
Show resolved
Hide resolved
A little more explaination on the root cause
For different date unit, the boundary of the prepared rounding is different. For example, for date unit of |
Ah, I didn't realize we already disallow non UTC timezones. It makes sense we may not want to support this optimization for non UTC in general as this could produce intervals which are not of fixed size.
I've added this check to |
Now that we disallow non UTC timezones from the start for this optimization i've added an additional change to remove this check from |
Signed-off-by: Finn Carroll <carrofin@amazon.com>
…te histo assertion bug per opensearch-project#16932 Signed-off-by: Finn Carroll <carrofin@amazon.com>
… preparedRounding of agg. Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
a5f823e
to
d6927fd
Compare
❌ Gradle check result for d6927fd: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for d6927fd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
❌ Gradle check result for 3a6dcb1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
The auto date histogram agg contains an optimization which skips traditional doc collection and instead examines the "pre-aggregated" doc counts contained in the BKD tree of each segment. Several conditions need to be true before this optimization can execute for a given segment. One such condition is we must be able to determine a set of ranges (or rounding) for the segment under consideration before optimizing.
Normally ranges for the auto date histogram aggregation are updated as needed over the course of collecting documents but the filter rewrite optimization will update the
preparedRounding
of the agg in accordance with the min & max values of the segment under consideration ahead of time since it skips regular doc collection.As a result, it is possible for our
preparedRounding
to shrink as the rounding built from the segment could easily be smaller than the rounding previously used by our shard level aggregator.This usually does not pose a problem as the
preparedRounding
will be updated accordingly when we collect our next document, or reduce our shard level aggs into a single top level agg.The specific case where this becomes problematic is when our
preparedRounding
is delegating to a "bounded" structure. When we prepare a rounding we do so for the min & max epoch time of our shard since this allows us to optimize the structure we delegate rounding to.For some ranges of epoch time and time zones rounding will be little more than a modulo operation. However if our min & max epoch time crosses "transitions" such as daylight savings we may want to delegate rounding to a linked list or array structure to quickly lookup these transitions. This is why the specific occurrence of this bug linked in the initial issue only appears when
"time_zone":"America/New_York"
.The combination of delegating rounding to these strictly bounded structures and the filter rewrite optimization "replaying" our previous bucket keys fails an assertion within our
preparedRounding
as our previous bucket keys are not guaranteed to fit within the strict bounds of the rounding prepared for the current segment being collected.The changes in this PR resolve this by ensuring the filter rewrite optimization only ever increases the granularity of our
preparedRounding
.Related Issues
Resolves #16932
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.