Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update semantics or definition of case_insensitive in term queries #63893

Closed
swallez opened this issue Oct 19, 2020 · 6 comments · Fixed by #63926
Closed

Update semantics or definition of case_insensitive in term queries #63893

swallez opened this issue Oct 19, 2020 · 6 comments · Fixed by #63926
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team team-discuss

Comments

@swallez
Copy link
Member

swallez commented Oct 19, 2020

PR #61596 added a new case_insensitive boolean flag to term queries. This is a boolean flag that can only be set to true, which is rather unusual.

After talking to @markharwood it turns out the semantics of this flag is as follows:

  • not set: uses the field's definition (which may or not include lowercase normalization)
  • set to true: case insensitive
  • set to false: not supported as we can't enforce case sensitivity if the field has been normalized with a lowercase normalizer.

From a Clients perspective, this makes this flag very much a snowflake. We can represent it as an optional boolean, but users will still have to be aware that this boolean, when set, can only be set to true. If the value for this flag comes from an application variable like a checkbox in a search dialog, users will have to add some logic to only set it if true and not if false.

So I suggest that we either change the semantics of this flag (and update the docs accordingly) or change its definition.

Change semantics:

  • true: case insensitive
  • false: use the field's definition

Change definition: rename it to case_sensitivity with values

  • field_default - use the field's setting
  • insensitive - case insensitive
  • and this leaves a door open to a future addition of sensitive if that make sense.

Changing the semantics can be enough if we're sure that sensitive doesn't (and will never) make sense, and will just be about updating the doc and removing some validation checks.

The enumeration will still require some custom logic in applications, but won't be surprising as it will be a type conversion from a boolean (checkbox) to an enum.

/cc @markharwood @jimczi

@swallez swallez added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Oct 19, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@markharwood
Copy link
Contributor

Following up on our discussion - we agreed to keep the case_insensitive boolean setting but remove the error when set to false. Proposed documentation for the setting in various queries is:

(Optional, boolean) allows case insensitive matching of the regular expression
value with the indexed field values when set to true. Default is false which means
the case sensitivity of matching depends on the underlying field's behaviour.

How does this sound?

@swallez
Copy link
Member Author

swallez commented Oct 20, 2020

Sounds good. Should we be a bit more specific about field's behaviour so that users understand where that behaviour is defined? Like field's behaviour (analyzers and normalizers) (not sure these words are formally correct though).

@markharwood
Copy link
Contributor

so that users understand where that behaviour is defined?

It becomes an "it depends" answer which goes beyond choice of analyzers and unfortunately includes choice of fields.
Given an identically normalised text and keyword field (eg lowercased) then:

  • a term query for fOo on the text field will NOT match original document value Foo
  • a term query for fOo on the keyword field WILL match original document value Foo

That's probably more detail than we want to get into in this short description.
How about changing it to matching depends on the underlying field's mapping? That covers both analyzers/mappers AND field type.

@jimczi
Copy link
Contributor

jimczi commented Oct 20, 2020

It becomes an "it depends" answer which goes beyond choice of analyzers and unfortunately includes choice of fields.

The problem you outlined is more a discrepancy that we have in the term-based queries since we don't apply normalization on text field whereas we do on keyword fields. So IMO we should apply normalization all the time to make the behavior consistent and change the documentation of the case_insensitive option to matching depends on the underlying field's mapping ?

@markharwood
Copy link
Contributor

we should apply normalization all the time to make the behavior consistent

Obviously for another issue. We closed this keyword issue with that comment about addressing it in future.

markharwood added a commit to markharwood/elasticsearch that referenced this issue Oct 21, 2020
markharwood added a commit that referenced this issue Oct 21, 2020
…63926)

* Remove errors when case_insensitive flag set to false

Closes #63893
markharwood added a commit to markharwood/elasticsearch that referenced this issue Oct 21, 2020
markharwood added a commit to markharwood/elasticsearch that referenced this issue Oct 21, 2020
markharwood added a commit that referenced this issue Oct 21, 2020
…63926) (#63981)

* Remove errors when case_insensitive flag set to false

Closes #63893
markharwood added a commit that referenced this issue Oct 21, 2020
…63926)

Remove errors when case_insensitive flag set to false

Closes #63893
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team team-discuss
Projects
None yet
4 participants