Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi_match lucene query broken since 8.7.0 #95738

Closed
neominik opened this issue May 2, 2023 · 3 comments · Fixed by #95772
Closed

multi_match lucene query broken since 8.7.0 #95738

neominik opened this issue May 2, 2023 · 3 comments · Fixed by #95772
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@neominik
Copy link

neominik commented May 2, 2023

Elasticsearch Version

Version: 8.7.1, Build: docker/f229ed3f893a515d590d0f39b05f68913e2d9b53/2023-04-27T04:33:42.127815583Z, JVM: 20.0.1

Installed Plugins

No response

Java Version

bundled

OS Version

Linux f7ea544ab24a 5.15.49-linuxkit #1 SMP Tue Sep 13 07:51:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

Starting in 8.7.0, the lucene query constructed for multi_match queries of type phrase_prefix and multiple explicitly specified fields is wrong, only searching one filed multiple times instead of all specified fields once, resulting in incomplete search results. See next section for detailed steps and description of where it goes wrong.

Steps to Reproduce

  1. Create index
curl -X PUT "localhost:9200/people?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "custom",
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "tokenizer": "whitespace"
        }
      }
    }
  },
  "mappings": {
    "dynamic_templates": [
      {
        "strings": {
          "match_mapping_type": "string",
          "mapping": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword"
              }
            },
            "index_prefixes": {
              "min_chars": 1,
              "max_chars": 19
            }
          }
        }
      }
    ]
  }
}
'
  1. Add person to index
curl -X PUT "localhost:9200/people/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "first-name": "Harry",
  "last-name": "Potter",
  "email": "harry.potter@hogwarts.com",
  "identification": "01"
}
'
  1. Search in all fields using multi_match and phrase_prefix, not find anything 🤔
curl -X GET "localhost:9200/people/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match": {
      "query": "Har",
      "type": "phrase_prefix",
      "fields": [
        "first-name",
        "last-name",
        "email",
        "identification"
      ]
    }
  }
}
'
  1. Show lucene query
curl -X GET "localhost:9200/people/_validate/query?rewrite=true" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match": {
      "query": "Har",
      "type": "phrase_prefix",
      "fields": [
        "first-name",
        "last-name",
        "email",
        "identification"
      ]
    }
  }
}
'

The resulting query searches the identification field four times. This is not right.

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "people",
      "valid": true,
      "explanation": "(identification._index_prefix:har | identification._index_prefix:har | identification._index_prefix:har | identification._index_prefix:har)"
    }
  ]
}

In ES 8.6.2 and older, the query searches all fields once.

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "people",
      "valid": true,
      "explanation": "(identification._index_prefix:har | first-name._index_prefix:har | last-name._index_prefix:har | email._index_prefix:har)"
    }
  ]
}

Logs (if relevant)

No response

@neominik neominik added >bug needs:triage Requires assignment of a team area label labels May 2, 2023
@pxsalehi pxsalehi added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels May 2, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label May 2, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna self-assigned this May 3, 2023
@javanna
Copy link
Member

javanna commented May 3, 2023

Thanks a lot for reporting this! It's a subtle bug caused by a change in Lucene 9.5.0 (apache/lucene#11941). I will open a PR to fix this upstream.

@javanna
Copy link
Member

javanna commented May 3, 2023

I opened apache/lucene#12260 to address this in Lucene.

javanna added a commit to javanna/elasticsearch that referenced this issue May 3, 2023
This adds unit test coverage for a bug that was recently found in
Lucene. We would have caught it earlier if we were testing the
underlying lucene query being generated.

Closes elastic#95738
javanna added a commit that referenced this issue May 10, 2023
…95772)

This adds unit test coverage for a bug that was recently found in
Lucene. We would have caught it earlier if we were testing the
underlying lucene query being generated.

Closes #95738
javanna added a commit to javanna/elasticsearch that referenced this issue May 10, 2023
…lastic#95772)

This adds unit test coverage for a bug that was recently found in
Lucene. We would have caught it earlier if we were testing the
underlying lucene query being generated.

Closes elastic#95738
javanna added a commit that referenced this issue May 10, 2023
…95772)

This adds unit test coverage for a bug that was recently found in
Lucene. We would have caught it earlier if we were testing the
underlying lucene query being generated.

Closes #95738
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants