[BUG] auto_date_histogram query with time_zone can throw AssertionError #16932

peteralfonsi · 2025-01-02T21:01:10Z

Describe the bug

Running an auto_date_histogram query with time_zone can sometimes cause the cluster to crash with an AssertionError.

I don't think AssertionErrors crash clusters running on a tar install, but the stacktrace seems to show the error comes from checking the doc's timestamp is within certain bounds. So it still might indicate there's a problem with the query results.

The error doesn't happen if time_zone is absent since the error comes from code handling time zone lookup. It seems like the presence/value of buckets doesn't matter.

It seems like this only happens if I index the documents one by one (using opensearchpy). If I bulk index the same number of docs through opensearchpy, I don't see the error. I believe this is why I haven't seen this error when running benchmarks like nyc_taxis which use nearly identical queries.

When indexing one-by-one, if I refresh after each document, or if I manually refresh once at the very end, I've always seen the error. If I don't refresh at all, it succeeds (and there are nonzero doc counts in the returned buckets, so it's not that the docs aren't searched). If I refresh after bulk indexing, it still succeeds.

auto_date_histogram_tz_stacktrace.txt

Python script to set up an index that causes the error (.txt format as github doesn't support .py):
repro_autodatehisto_crash.txt

Related component

Search

To Reproduce

Start the cluster with ./gradlew run

Add some docs to an index with mapping:

"mappings": {
            "properties": {
            "day":    { "type" : "date", "format":"yyyy-MM-dd HH:mm:ss" } 
            }
        }

(I used the script attached in the bug description).
I've seen this crash using any number of docs between 50 and 2000 when indexing one at a time.

Run the query: curl -XGET http://localhost:9200/jan-index/_search -H 'Content-Type: application/json' -d'{"size": 0,"aggs": {"agg_name": {"auto_date_histogram": {"field": "day", "time_zone":"America/New_York"}}}}'

See the cluster crash with an AssertionError.

Expected behavior

No error should be thrown.

Additional Details

No response

The text was updated successfully, but these errors were encountered:

finnegancarroll · 2025-01-08T17:11:30Z

Taking a look!

finnegancarroll · 2025-01-08T23:48:47Z

I think this is a bug in the filter rewrite optimization path for this aggregation type and see the error disappear when the optimization is disabled.

When deciding if we can take this optimization path for a given segment we try to build fixed bucket ranges for a segment before continuing with the optimization. Building these ranges updates the prepared rounding of the aggregator as a side affect.

Since the optimized path only takes into account the bounds of each individual segment when building ranges and updates the shard level preparedRounding, we can end up with a rounding that has fixed bounds smaller than the global min/max of our shard level aggregation.

(Many Rounding objects are just arrays of ranges)

When we fall back to our regular un-optimized path and try to collect documents normally with our new much smaller prepared rounding this out of bounds assertion is thrown.

finnegancarroll · 2025-01-09T00:28:33Z

The error doesn't happen if time_zone is absent since the error comes from code handling time zone lookup. It seems like the presence/value of buckets doesn't matter.

preparedRounding objects for non UTC time zones are represented as arrays to better track skips and offsets such as daylight savings. In contrast UTC rounding is just a modulo operation and does not have fixed bounds. I think this accounts for the difference in behavior between time zones.

finnegancarroll · 2025-01-09T00:34:56Z

It seems like this only happens if I index the documents one by one (using opensearchpy). If I bulk index the same number of docs through opensearchpy, I don't see the error. I believe this is why I haven't seen this error when running benchmarks like nyc_taxis which use nearly identical queries.

I notice that this exception is frequently triggered by small segments since they are most at risk of producing an erroneous preparedRounding with bounds too small to accommodate all documents in the shard. A single segment or multiple segments that produce the same effective preparedRounding shouldn't run into this issue which could explain the above.

…te histo assertion bug per opensearch-project#16932 Signed-off-by: Finn Carroll <carrofin@amazon.com>

… preparedRounding of agg. Signed-off-by: Finn Carroll <carrofin@amazon.com>

…te histo assertion bug per opensearch-project#16932 Signed-off-by: Finn Carroll <carrofin@amazon.com>

… preparedRounding of agg. Signed-off-by: Finn Carroll <carrofin@amazon.com>

…te histo assertion bug per opensearch-project#16932 Signed-off-by: Finn Carroll <carrofin@amazon.com>

… preparedRounding of agg. Signed-off-by: Finn Carroll <carrofin@amazon.com>

peteralfonsi added bug Something isn't working untriaged labels Jan 2, 2025

github-actions bot added the Search Search query, autocomplete ...etc label Jan 2, 2025

github-project-automation bot added this to Search Project Board Jan 2, 2025

github-project-automation bot moved this to 🆕 New in Search Project Board Jan 2, 2025

sandeshkr419 removed the untriaged label Jan 8, 2025

sandeshkr419 assigned finnegancarroll Jan 8, 2025

finnegancarroll added this to Performance Roadmap Jan 9, 2025

github-project-automation bot moved this to Todo in Performance Roadmap Jan 9, 2025

finnegancarroll moved this from Todo to In Progress in Performance Roadmap Jan 9, 2025

finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 14, 2025

Add testFilterRewriteWithTZRoundingRangeAssert() to reproduce auto da…

b88245a

…te histo assertion bug per opensearch-project#16932 Signed-off-by: Finn Carroll <carrofin@amazon.com>

finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 14, 2025

Fix opensearch-project#16932. Ensure optimized path can only increase…

e1eabc0

… preparedRounding of agg. Signed-off-by: Finn Carroll <carrofin@amazon.com>

finnegancarroll linked a pull request Jan 14, 2025 that will close this issue

Fix auto date histogram rounding assertion bug #17023

Open

3 tasks

finnegancarroll moved this from In Progress to In-Review in Performance Roadmap Jan 15, 2025

finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 16, 2025

Add testFilterRewriteWithTZRoundingRangeAssert() to reproduce auto da…

bcc7fc1

…te histo assertion bug per opensearch-project#16932 Signed-off-by: Finn Carroll <carrofin@amazon.com>

finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 16, 2025

Fix opensearch-project#16932. Ensure optimized path can only increase…

df18fac

… preparedRounding of agg. Signed-off-by: Finn Carroll <carrofin@amazon.com>

finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 21, 2025

Add testFilterRewriteWithTZRoundingRangeAssert() to reproduce auto da…

f89f68b

…te histo assertion bug per opensearch-project#16932 Signed-off-by: Finn Carroll <carrofin@amazon.com>

finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 21, 2025

Fix opensearch-project#16932. Ensure optimized path can only increase…

f002266

… preparedRounding of agg. Signed-off-by: Finn Carroll <carrofin@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] auto_date_histogram query with time_zone can throw AssertionError #16932

[BUG] auto_date_histogram query with time_zone can throw AssertionError #16932

peteralfonsi commented Jan 2, 2025 •

edited

Loading

finnegancarroll commented Jan 8, 2025

finnegancarroll commented Jan 8, 2025

finnegancarroll commented Jan 9, 2025

finnegancarroll commented Jan 9, 2025

[BUG] auto_date_histogram query with time_zone can throw AssertionError #16932

[BUG] auto_date_histogram query with time_zone can throw AssertionError #16932

Comments

peteralfonsi commented Jan 2, 2025 • edited Loading

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

finnegancarroll commented Jan 8, 2025

finnegancarroll commented Jan 8, 2025

finnegancarroll commented Jan 9, 2025

finnegancarroll commented Jan 9, 2025

peteralfonsi commented Jan 2, 2025 •

edited

Loading