-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Multimatch on fields with distincts synonym analyzers does not work properly #7691
Comments
I've added a draft PR in #10148. It matches the exact request that is being made here, although I am not a 100% sure yet if this issue is with multi_match or with the synonym analyzer. I'm having issues getting the test case to pass, I keep encountering this:
Looking at other tests that make use of search analyzers in settings, looks like I'm not doing anything too different but I can't seem to get out of this issue -- Line 1090 in 0c4e950
I've also verified that this issue does not occur in 2.5 and shows up in 2.6. Looking at the diff, I don't see any changes to the MultiMatchQuery or the SynonymAnalyzer, although I'm not entirely sure I'm looking at the right places -- https://github.com/opensearch-project/OpenSearch/blame/750901a2d1304961070563a66313fa34c03ec779/server/src/main/java/org/opensearch/index/search/MultiMatchQuery.java#L221 There was a lucene version upgrade that could've caused this issue. Any thoughts @msfroh @rishabhmaurya |
For information, this issue is no longer reproductible in version 2.10 of opensearch ( I did not tried the 2.9 version ). |
I suspect that this may have been fixed in 2.8 (since I was taking with @mingshl about a very similar bug report on Friday). OpenSearch 2.6 brought in an upgrade from Lucene 9.4.2 to 9.5.0, and OpenSearch 2.8 upgraded to Lucene 9.6.0. I took a look at the "Bug Fixes" section of the Lucene 9.6.0 change log for anything having to do with synonym queries. I suspect that the fix in 9.6.0 was apache/lucene#12260, and the bug in 9.5.0 was apache/lucene#11941. Note that the fix doesn't show up in the published 9.6.0 change log, but it does show up in the 9.6.0 section of later change logs. |
We don't upgrade the Lucene version on past releases. In this case, I believe the correct course is just to say "This is a known bug in 2.6 and 2.7, but we know it is fixed in 2.8 and later". |
Let's get #10148 to pass and close this with it? @harshavamsi |
Describe the bug
I stumbled on an edge case when mixing different fields with search-time synonyms analyzers in a multimatch query. In the case described below we can see in the profiling information that the multimatch query is rewrite into an incorrect DisjunctionMaxQuery that targets the same field multiple times.
This bug is reproductible in 2.6.0 and 2.7.0
To Reproduce
Here is a minimal scenario to reproduce the bug
This request does not match our document and in the profiling we can see this :
The
text
field is targeted twice and thetitle
field is not used in the query.Note that in version 2.5.0 this works correctly and the query generated is :
Also note that if we have more terms in the query it works fine :
Expected behavior
The multimatch query should targets both fields
Plugins
No plugins
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: