Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Multimatch on fields with distincts synonym analyzers does not work properly #7691

Open
pierre-mallet opened this issue May 23, 2023 · 6 comments
Assignees
Labels
bug Something isn't working Search:Relevance Search Search query, autocomplete ...etc

Comments

@pierre-mallet
Copy link

Describe the bug

I stumbled on an edge case when mixing different fields with search-time synonyms analyzers in a multimatch query. In the case described below we can see in the profiling information that the multimatch query is rewrite into an incorrect DisjunctionMaxQuery that targets the same field multiple times.

This bug is reproductible in 2.6.0 and 2.7.0

To Reproduce

Here is a minimal scenario to reproduce the bug

curl -X PUT localhost:9200/synonyms -d '{
  "aliases": {},
  "mappings": {
    "dynamic": false,
    "properties": {
      "text": {
        "search_analyzer": "synonym_1",
        "type": "text"
      },
      "title": {
        "search_analyzer": "synonym_2",
        "type": "text"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "synonym_1": {
          "filter": [
            "synonyms"
          ],
          "tokenizer": "standard"
        },
        "synonym_2": {
          "filter": [
            "synonyms"
          ],
          "tokenizer": "standard"
        }
      },
      "filter": {
        "synonyms": {
          "lenient": "true",
          "synonyms": [
            "term1, term2"
          ],
          "type": "synonym_graph"
        }
      }
    }
  }
}'

curl -X POST localhost:9200/synonyms/_doc/test_doc -d '{
  "title": "term1"
}'

curl -X POST localhost:9200/synonyms/_search -d '{
  "profile": "true",
  "from": 0,
  "query": {
    "multi_match": {
      "boost": 1,
      "fields": [
        "text^1.0",
        "title^1.0"
      ],
      "query": "term1",
      "type": "best_fields"
    }
  }
}'

This request does not match our document and in the profiling we can see this :

"profile": {
    "shards": [
      {
        "id": "[GaBuyLhQSyalJSCmyrTaIA][synonyms][0]",
        "inbound_network_time_in_millis": 0,
        "outbound_network_time_in_millis": 0,
        "searches": [
          {
            "query": [
              {
                "type": "DisjunctionMaxQuery",
                "description": **"(Synonym(text:term1 text:term2) | Synonym(text:term1 text:term2))",**

The text field is targeted twice and the title field is not used in the query.

Note that in version 2.5.0 this works correctly and the query generated is :

 "type" : "DisjunctionMaxQuery",
 "description" : "(Synonym(title:term1 title:term2) | Synonym(text:term1 text:term2))",

Also note that if we have more terms in the query it works fine :

curl -X POST localhost:9200/synonyms/_search -d '{
  "profile": "true",
  "from": 0,
  "query": {
    "multi_match": {
      "boost": 1,
      "fields": [
        "text^1.0",
        "title^1.0"
      ],
      "query": "term1 term3 ",
      "type": "best_fields"
    }
  }
}'

=>

"type": "DisjunctionMaxQuery",
"description": "((Synonym(title:term1 title:term2) title:term3) | (Synonym(text:term1 text:term2) text:term3))",

Expected behavior

The multimatch query should targets both fields

Plugins

No plugins

Host/Environment (please complete the following information):

  • OS: Docker
  • Version : 2.6.0
@pierre-mallet pierre-mallet added bug Something isn't working untriaged labels May 23, 2023
@pierre-mallet pierre-mallet changed the title [BUG] Multimatch on fields with distincts synonym analyzers does "sometime" not work properly [BUG] Multimatch on fields with distincts synonym analyzers does not work properly May 30, 2023
@andrross andrross added the Search Search query, autocomplete ...etc label May 30, 2023
@sejli sejli removed the untriaged label Jun 6, 2023
@harshavamsi
Copy link
Contributor

I've added a draft PR in #10148. It matches the exact request that is being made here, although I am not a 100% sure yet if this issue is with multi_match or with the synonym analyzer.

I'm having issues getting the test case to pass, I keep encountering this:

 2> java.lang.IllegalArgumentException: Unknown filter type [synonym] for [synonyms]
        at __randomizedtesting.SeedInfo.seed([91D695E83F6A921B:84D55249764E76E3]:0)
        at org.opensearch.index.analysis.AnalysisRegistry.getAnalysisProvider(AnalysisRegistry.java:565)
        at org.opensearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:516)
        at org.opensearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:336)
        at org.opensearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:239)
        at org.opensearch.index.IndexModule.newIndexService(IndexModule.java:628)
        at org.opensearch.indices.IndicesService.createIndexService(IndicesService.java:839)
        at org.opensearch.indices.IndicesService.withTempIndexService(IndicesService.java:782)
        at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexWithTemporaryService(MetadataCreateIndexService.java:479)
        at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequestWithV1Templates(MetadataCreateIndexService.java:584)
        at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:441)
        at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:448)
        at org.opensearch.cluster.metadata.MetadataCreateIndexService$1.execute(MetadataCreateIndexService.java:354)
        at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:65)
        at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:874)
        at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:424)
        at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:295)
        at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:206)
        at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204)
        at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.lang.Thread.run(Thread.java:1623)

Looking at other tests that make use of search analyzers in settings, looks like I'm not doing anything too different but I can't seem to get out of this issue --

public void testThatSuggestStopFilterWorks() throws Exception {

I've also verified that this issue does not occur in 2.5 and shows up in 2.6. Looking at the diff, I don't see any changes to the MultiMatchQuery or the SynonymAnalyzer, although I'm not entirely sure I'm looking at the right places -- https://github.com/opensearch-project/OpenSearch/blame/750901a2d1304961070563a66313fa34c03ec779/server/src/main/java/org/opensearch/index/search/MultiMatchQuery.java#L221

There was a lucene version upgrade that could've caused this issue. Any thoughts @msfroh @rishabhmaurya

@pierre-mallet
Copy link
Author

For information, this issue is no longer reproductible in version 2.10 of opensearch ( I did not tried the 2.9 version ).

@msfroh
Copy link
Collaborator

msfroh commented Oct 16, 2023

I suspect that this may have been fixed in 2.8 (since I was taking with @mingshl about a very similar bug report on Friday).

OpenSearch 2.6 brought in an upgrade from Lucene 9.4.2 to 9.5.0, and OpenSearch 2.8 upgraded to Lucene 9.6.0.

I took a look at the "Bug Fixes" section of the Lucene 9.6.0 change log for anything having to do with synonym queries. I suspect that the fix in 9.6.0 was apache/lucene#12260, and the bug in 9.5.0 was apache/lucene#11941.

Note that the fix doesn't show up in the published 9.6.0 change log, but it does show up in the 9.6.0 section of later change logs.

@mingshl
Copy link
Contributor

mingshl commented Oct 16, 2023

Thanks @msfroh for pointing the right place!!! It's fixed by Lucene at 9.6 here, I can upgrade the lucene version at 2.6 and 2.7 to fix the issue for the multi-match query not working property, shall I ?

@msfroh
Copy link
Collaborator

msfroh commented Oct 16, 2023

I can upgrade the lucene version at 2.6 and 2.7 to fix the issue for the multi-match query not working property, shall I ?

We don't upgrade the Lucene version on past releases.

In this case, I believe the correct course is just to say "This is a known bug in 2.6 and 2.7, but we know it is fixed in 2.8 and later".

@dblock
Copy link
Member

dblock commented Oct 16, 2023

Let's get #10148 to pass and close this with it? @harshavamsi

@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Relevance Search Search query, autocomplete ...etc
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

8 participants