Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Intermittent UnsupportedOperationException errors with nested queries #17140

Open
lizjackson-toast opened this issue Jan 27, 2025 · 9 comments
Labels
bug Something isn't working untriaged

Comments

@lizjackson-toast
Copy link

lizjackson-toast commented Jan 27, 2025

Describe the bug

We use a boolean query that involves a nested field, like this:

      "query":{
         "bool":{
            "filter":[
               {
                  "terms":{
                     "name.keyword":["my-test-name"]
                  }
               },
               {
                  "nested":{
                     "path":"foo",
                     "query":{
                        "match_all":{}
                     },
                     "inner_hits":{
                        "size":256,
                        "sort":[
                           {
                              "foo.bar":"desc"
                           }
                        ]
                     }
                  }
               }
            ]
         }
      },

We seem to be hitting the error described in this forum post, where this query gives us intermittent UnsupportedOperationException errors. Have others run into this? Does anybody have more information about how to avoid or debug these errors?

Here is the the stacktrace from the OpenSearch logs:

Caused by: NotSerializableExceptionWrapper[unsupported_operation_exception: null]
	at org.opensearch.index.fielddata.AbstractNumericDocValues.advance(AbstractNumericDocValues.java:60)
	at org.apache.lucene.search.comparators.NumericComparator$NumericLeafComparator$2.advance(NumericComparator.java:416)
	at org.apache.lucene.search.ConjunctionBulkScorer.score(ConjunctionBulkScorer.java:162)
	at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71)
	at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
	at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:338)
	at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:289)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560)
	at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:361)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:468)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:456)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:438)
	at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60)
	at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:61)
	at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:284)
	at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:157)
	at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:643)
	at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:707)
	at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:676)
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.lang.Thread.run(Thread.java:1583)

To reproduce

Use a boolean query with a nested path like the query above.

Expected behavior

No intermittent failures

Screenshots

No response

Host / Environment

No response

Additional context

No response

Relevant log output

@lizjackson-toast lizjackson-toast added bug Something isn't working untriaged labels Jan 27, 2025
@lizjackson-toast lizjackson-toast changed the title [Bug]: Intermitten UnsupportedOperationException errors with nested queries [Bug]: Intermittent UnsupportedOperationException errors with nested queries Jan 27, 2025
@gaiksaya gaiksaya transferred this issue from opensearch-project/opensearch-build Jan 27, 2025
@msfroh
Copy link
Collaborator

msfroh commented Jan 27, 2025

Interesting ... that implementation of advance on AbstractNumericDocValues should theoretically never get called, because every possible subclass should either override it or guarantee that it doesn't get called (by only getting used for "safe" cases).

The Javadoc says:

 * Base implementation that throws an {@link IOException} for the
 * {@link DocIdSetIterator} APIs. This impl is safe to use for sorting and
 * aggregations, which only use {@link #advanceExact(int)} and
 * {@link #longValue()}.
 *
 * In case when optimizations based on point values are used, the {@link #advance(int)}
 * and, optionally, {@link #cost()} have to be implemented as well.

In this case, the doc values are clearly be used in the query (calling advance), related to finding a competitive value.

I tried removing the "implementation" of advance from AbstractNumericDocValues to see what fails to compile and got the following output:

server/src/main/java/org/opensearch/search/MultiValueMode.java:555: error: <anonymous org.opensearch.search.MultiValueMode$6> is not abstract and does not override abstract method advance(int) in DocIdSetIterator
            return new AbstractNumericDocValues() {
                                                  ^
server/src/main/java/org/opensearch/search/MultiValueMode.java:609: error: <anonymous org.opensearch.search.MultiValueMode$7> is not abstract and does not override abstract method advance(int) in DocIdSetIterator
        return new AbstractNumericDocValues() {
                                              ^
server/src/main/java/org/opensearch/search/aggregations/bucket/sampler/DiversifiedBytesHashSamplerAggregator.java:128: error: <anonymous org.opensearch.search.aggregations.bucket.sampler.DiversifiedBytesHashSamplerAggregator$DiverseDocsDeferringCollector$ValuesDiversifiedTopDocsCollector$1> is not abstract and does not override abstract method advance(int) in DocIdSetIterator
                return new AbstractNumericDocValues() {
                                                      ^
server/src/main/java/org/opensearch/search/aggregations/bucket/sampler/DiversifiedMapSamplerAggregator.java:138: error: <anonymous org.opensearch.search.aggregations.bucket.sampler.DiversifiedMapSamplerAggregator$DiverseDocsDeferringCollector$ValuesDiversifiedTopDocsCollector$1> is not abstract and does not override abstract method advance(int) in DocIdSetIterator
                return new AbstractNumericDocValues() {
                                                      ^
server/src/main/java/org/opensearch/search/aggregations/bucket/sampler/DiversifiedNumericSamplerAggregator.java:125: error: <anonymous org.opensearch.search.aggregations.bucket.sampler.DiversifiedNumericSamplerAggregator$DiverseDocsDeferringCollector$ValuesDiversifiedTopDocsCollector$1> is not abstract and does not override abstract method advance(int) in DocIdSetIterator
                return new AbstractNumericDocValues() {
                                                      ^
server/src/main/java/org/opensearch/search/aggregations/bucket/sampler/DiversifiedOrdinalsSamplerAggregator.java:123: error: <anonymous org.opensearch.search.aggregations.bucket.sampler.DiversifiedOrdinalsSamplerAggregator$DiverseDocsDeferringCollector$ValuesDiversifiedTopDocsCollector$1> is not abstract and does not override abstract method advance(int) in DocIdSetIterator
                    return new AbstractNumericDocValues() {
                                                          ^
server/src/main/java/org/opensearch/search/aggregations/bucket/sampler/DiversifiedOrdinalsSamplerAggregator.java:141: error: <anonymous org.opensearch.search.aggregations.bucket.sampler.DiversifiedOrdinalsSamplerAggregator$DiverseDocsDeferringCollector$ValuesDiversifiedTopDocsCollector$2> is not abstract and does not override abstract method advance(int) in DocIdSetIterator
                return new AbstractNumericDocValues() {
                                                      ^
server/src/main/java/org/apache/lucene/search/grouping/CollapsingDocValuesSource.java:119: error: <anonymous org.apache.lucene.search.grouping.CollapsingDocValuesSource$Numeric$1> is not abstract and does not override abstract method advance(int) in DocIdSetIterator
                        values = new AbstractNumericDocValues() {
                                                                ^

My hunch is that the problem is coming from one of the first two implementations (the anonymous classes in MultiValueMode), since the request doesn't involve diversified sampler aggregators and there's no field collapsing.

In particular, I'm looking at the implementation on line 609, because that includes a parent bitset and child DocIdSetIterator, which are used to evaluate nested queries. I think a possible advance method should delegate to advanceExact. Since advanceExact always returns true, I think advance can return its input (since somehow this implementation always has the thing we're trying to advance to).

@lizjackson-toast -- which OpenSearch version are you using? Are you able to apply that fix to MultiValueMode and see if it eliminates the problem at your end? Thanks!

@msfroh
Copy link
Collaborator

msfroh commented Jan 27, 2025

Incidentally, this looks related to #12089, which was released in 2.12.

@lizjackson-toast
Copy link
Author

Thanks @msfroh for the quick response! I appreciate that. We are using OpenSearch version 2.17.0.

You mention line 609, but here in the latest commit on MultiValueMode, line 609 is just int count = 0. Can you clarify which line(s) need to be updated to allow advance to delegate to advanceExact in the fix you have in mind?

Thanks again!

@lizjackson-toast
Copy link
Author

Oh, I think you may mean line 690 and not 609 – is that right?

If so, if I interpret correctly, you're suggesting that we change this:

                @Override
                public int advance(int target) throws IOException {
                    return values.advance(target);
                }

To this:

                @Override
                public int advance(int target) throws IOException {
                    return values.advanceExact(target);
                }

Is that right? Do you have any docs about how we can test this? We appreciate your help!

@msfroh
Copy link
Collaborator

msfroh commented Jan 28, 2025

Oh, I think you may mean line 690 and not 609 – is that right?

I was looking at the 2.13 branch initially, to try to see if I could connect things to the stack trace in the linked forum post -- though I now notice that the logs point to 2.16.

Anyway, it looks like the AbstractNumericDocValues implementation that I was talking about yesterday has moved down to line 812.

My concern is around the advance logic and how it should interact with parent documents, when there's nesting. I see that @reta handled a similar case around NumericDoubleValues here. Now I'm wondering if that implementation is correct.

Specifically, for each of the select() methods where theres a parentDoc bitset passed in, I feel like the returned doc values should implement advance() like:

            @Override
            public int advance(int target) throws IOException {
                if (advanceExact(target)) {
                    return target;
                }
                throw new IllegalStateException("advanceExact should always return true");
            }

(That will work for all of the select() implementations in that class except the SortedDocValues version at line 1183, but that's okay. We only need advance() for numeric iterators, and SortedDocValues is for strings.)

@reta -- do you remember if you looked into the nested docs case when you worked on #12089 ? I skimmed through it and didn't see anything, but I might have missed it.

@reta
Copy link
Collaborator

reta commented Jan 28, 2025

@reta -- do you remember if you looked into the nested docs case when you worked on #12089 ? I skimmed through it and didn't see anything, but I might have missed it.

@msfroh I definitely not looked into nested docs case, there is a miss on my side :( we apparently have no test cases that manifest the problem with nested docs, partially to justify a miss here.

@lizjackson-toast
Copy link
Author

Thanks, @msfroh and @reta! In terms of what my own team should do next, is this an issue you'll look into fixing on your end, would you like to collaborate on a fix, something else? Thanks again for looking into this!

@reta
Copy link
Collaborator

reta commented Jan 29, 2025

Thanks, @msfroh and @reta! In terms of what my own team should do next, is this an issue you'll look into fixing on your end, would you like to collaborate on a fix, something else? Thanks again for looking into this!

Thanks @lizjackson-toast , if your team could submit a fix, that would be just great!

@msfroh
Copy link
Collaborator

msfroh commented Jan 31, 2025

If you need any help to get started on a fix, please let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants