Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour when using synonyms with match and multi-match queries #97396

Closed
elliotthumphreys opened this issue Jul 5, 2023 · 2 comments
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@elliotthumphreys
Copy link

elliotthumphreys commented Jul 5, 2023

Elasticsearch Version

8.7.1

Installed Plugins

No response

Java Version

20.0.1

OS Version

ubuntu, but running in the azureaks vm image

Problem Description

Say I have an analyser called 'searchAnalyser' which contains the synonym bar => bar, baz. All the queries mentioned below will be using this analyser at search time.

Say I have four documents within my index which contain a name and a description:

  • document_1: (name: bar, description: foo)
  • document_2: (name: baz, description: foo)
  • document_3: (name: foo, description: baz)
  • document_4: (name: foo, description: bar)

Current behaviour

Scenario 1

When performing a multi-match query with the default type best_fields the results are missing documents that do not contain the search term in the first property that is checked.

The query performed:

"multi_match": {
	"fields": [
		"name",
		"description"
	],
	"query": "bar",
	"analyzer": "searchAnalyzer"
}

We will only receive back document_1 and document_2, and not receive back document_3 and document_4.

Scenario 2

When performing a bool query with a minimum_should_match of 1, and two match_phrase 'should' queries the results are missing documents that do not contain the search term in the first match_phrase query.

The query performed:

"bool": {
	"minimum_should_match": 1,
	"should": [
		{
			"match_phrase": {
				"name": {
					"query": "bar",
					"analyzer": "searchAnalyzer"
				}
			}
		},
		{
			"match_phrase": {
				"description": {
					"query": "bar",
					"analyzer": "searchAnalyzer"
				}
			}
		}
	]
}

We will only receive back document_1 and document_2, and not receive back document_3 and document_4.

Expected behaviour

Both the multi-match and match queries should have brought back all 4 documents. It looks like if you are using synonyms and the terms do not appear in the first property that is checked then the document is not returned.

Steps to Reproduce

The bodies for each request to es have been added to the following gist: https://gist.github.com/elliotthumphreys/834bde8eef763d4f8b7b5655e0237346

  1. Create an index using the Index.json file, this will create an index with two properties (name, description) and an analyser with the basic synonym bar => bar, baz
  2. Create documents 1 to 4 using the following files
    2.1. document1.json (name: foo, description: bar)
    2.2. document2.json (name: bar, description: foo)
    2.3. document3.json (name: baz, description: foo)
    2.4. document4.json (name: foo, description: baz)
  3. Run a multi-match search query (using MultiMatchQuery.json) that targets both name and description using the default best_fields type and uses the searchAnalyser with the synonyms. You will observe that only documents 2 and 3 are returned, however, all 4 documents should have been returned as they all contain either bar or baz.
  4. Run a match search query (using MatchQuery.json). This is a bool query with a min_should_match of 1, and two match_phrase 'should' queries that target name and description respectively and also use the searchAnalyser. This also only returns 2 documents.

Logs (if relevant)

No response

@elliotthumphreys elliotthumphreys added >bug needs:triage Requires assignment of a team area label labels Jul 5, 2023
@andreidan andreidan added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels Jul 6, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek
Copy link
Contributor

Thanks for reporting @elliotthumphreys! This was a bug in lucene (previously reported here: #95738) and can be fixed by upgrading to elasticsearch 8.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants