Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] auto_date_histogram query with time_zone can throw AssertionError #16932

Open
peteralfonsi opened this issue Jan 2, 2025 · 4 comments · May be fixed by #17023
Open

[BUG] auto_date_histogram query with time_zone can throw AssertionError #16932

peteralfonsi opened this issue Jan 2, 2025 · 4 comments · May be fixed by #17023
Assignees
Labels
bug Something isn't working Search Search query, autocomplete ...etc

Comments

@peteralfonsi
Copy link
Contributor

peteralfonsi commented Jan 2, 2025

Describe the bug

Running an auto_date_histogram query with time_zone can sometimes cause the cluster to crash with an AssertionError.

I don't think AssertionErrors crash clusters running on a tar install, but the stacktrace seems to show the error comes from checking the doc's timestamp is within certain bounds. So it still might indicate there's a problem with the query results.

The error doesn't happen if time_zone is absent since the error comes from code handling time zone lookup. It seems like the presence/value of buckets doesn't matter.

It seems like this only happens if I index the documents one by one (using opensearchpy). If I bulk index the same number of docs through opensearchpy, I don't see the error. I believe this is why I haven't seen this error when running benchmarks like nyc_taxis which use nearly identical queries.

When indexing one-by-one, if I refresh after each document, or if I manually refresh once at the very end, I've always seen the error. If I don't refresh at all, it succeeds (and there are nonzero doc counts in the returned buckets, so it's not that the docs aren't searched). If I refresh after bulk indexing, it still succeeds.

auto_date_histogram_tz_stacktrace.txt

Python script to set up an index that causes the error (.txt format as github doesn't support .py):
repro_autodatehisto_crash.txt

Related component

Search

To Reproduce

Start the cluster with ./gradlew run

Add some docs to an index with mapping:

"mappings": {
            "properties": {
            "day":    { "type" : "date", "format":"yyyy-MM-dd HH:mm:ss" } 
            }
        }

(I used the script attached in the bug description).
I've seen this crash using any number of docs between 50 and 2000 when indexing one at a time.

Run the query: curl -XGET http://localhost:9200/jan-index/_search -H 'Content-Type: application/json' -d'{"size": 0,"aggs": {"agg_name": {"auto_date_histogram": {"field": "day", "time_zone":"America/New_York"}}}}'

See the cluster crash with an AssertionError.

Expected behavior

No error should be thrown.

Additional Details

No response

@peteralfonsi peteralfonsi added bug Something isn't working untriaged labels Jan 2, 2025
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Jan 2, 2025
@finnegancarroll
Copy link
Contributor

Taking a look!

@finnegancarroll
Copy link
Contributor

I think this is a bug in the filter rewrite optimization path for this aggregation type and see the error disappear when the optimization is disabled.

When deciding if we can take this optimization path for a given segment we try to build fixed bucket ranges for a segment before continuing with the optimization. Building these ranges updates the prepared rounding of the aggregator as a side affect.

Since the optimized path only takes into account the bounds of each individual segment when building ranges and updates the shard level preparedRounding, we can end up with a rounding that has fixed bounds smaller than the global min/max of our shard level aggregation.

(Many Rounding objects are just arrays of ranges)

When we fall back to our regular un-optimized path and try to collect documents normally with our new much smaller prepared rounding this out of bounds assertion is thrown.

@finnegancarroll
Copy link
Contributor

The error doesn't happen if time_zone is absent since the error comes from code handling time zone lookup. It seems like the presence/value of buckets doesn't matter.

preparedRounding objects for non UTC time zones are represented as arrays to better track skips and offsets such as daylight savings. In contrast UTC rounding is just a modulo operation and does not have fixed bounds. I think this accounts for the difference in behavior between time zones.

@finnegancarroll
Copy link
Contributor

It seems like this only happens if I index the documents one by one (using opensearchpy). If I bulk index the same number of docs through opensearchpy, I don't see the error. I believe this is why I haven't seen this error when running benchmarks like nyc_taxis which use nearly identical queries.

I notice that this exception is frequently triggered by small segments since they are most at risk of producing an erroneous preparedRounding with bounds too small to accommodate all documents in the shard. A single segment or multiple segments that produce the same effective preparedRounding shouldn't run into this issue which could explain the above.

@finnegancarroll finnegancarroll moved this from Todo to In Progress in Performance Roadmap Jan 9, 2025
finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 14, 2025
…te histo assertion bug per opensearch-project#16932

Signed-off-by: Finn Carroll <carrofin@amazon.com>
finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 14, 2025
… preparedRounding of agg.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
@finnegancarroll finnegancarroll linked a pull request Jan 14, 2025 that will close this issue
3 tasks
@finnegancarroll finnegancarroll moved this from In Progress to In-Review in Performance Roadmap Jan 15, 2025
finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 16, 2025
…te histo assertion bug per opensearch-project#16932

Signed-off-by: Finn Carroll <carrofin@amazon.com>
finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 16, 2025
… preparedRounding of agg.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 21, 2025
…te histo assertion bug per opensearch-project#16932

Signed-off-by: Finn Carroll <carrofin@amazon.com>
finnegancarroll added a commit to finnegancarroll/OpenSearch that referenced this issue Jan 21, 2025
… preparedRounding of agg.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc
Projects
Status: In-Review
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

3 participants