Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pre-filter query phase to field capabilities API #56195

Closed
jimczi opened this issue May 5, 2020 · 8 comments · Fixed by #57276
Closed

Add a pre-filter query phase to field capabilities API #56195

jimczi opened this issue May 5, 2020 · 8 comments · Fixed by #57276
Labels
>feature :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jimczi
Copy link
Contributor

jimczi commented May 5, 2020

Context

The _field_caps API allows to retrieve the capabilities of fields among multiple indices.
These capabilities can then be used to build search requests on these indices.
Today this API checks index mapping directly to extract the list of fields that exist in each index.

With the removal of types, we emphasized the need to separate indices based on their mappings. Beats for instance, now creates one index per module in order to ensure that fields defined in the mapping are dense (they are needed for every document in the index).
However, this strategy can be counter-productive for index patterns that mix multiple modules and rely on filters to select/eliminate specific modules.

For this specific use case, we added the constant_keyword field. The idea is to add a property that is shared by all documents in the index (e.g.: the Metricbeat module)
in order to eliminate indices without checking their content when queries contain a specific list of modules to look at.

For instance the following query:

GET metrics-*
{
  "query": {
    "term": {
        "module": "apache"
     }
  },
  ...
}

would query indices that define a constant_keyword like:

"module": {
    "type": "constant_keyword",
     "value": "apache"
}

and would efficiently eliminate indices with other or no value for the module field.

This strategy should ease the usage of generic index pattern for search requests since the filtering can be done inside queries but it also emphasizes the need to handle field extraction on a per-request basis.

Proposal

When using generic index patterns, the _field_caps API returns the full list of fields that appear in all indices. Now that constant_keyword can be used to filter indices based on their mapping, it would be useful to provide the ability to apply such filters in the _field_caps API.
So, instead of applying _field_caps to an index pattern only, the field capabilities could be resolved on a per-request basis in order to eliminate fields that cannot appear in search results.

The API would look like this:

GET metrics-*/_field_caps?fields=*
{
  "index_filter": {
    "term": {
        "module": "apache"
     }
  }
}

This call would eliminate indices that defines a constant_keyword named module with a value different than apache. We already have a special phase in search requests called the pre-filter phase that allows to check if a shard can match a query without executing the query.
The idea here is to apply this phase to the _field_caps API if a filter is provided. The query won't be executed, we only check if mandatory clauses can rewrite to match no docs from mapping informations and index statistics.
Indices that cannot match the filter would not add their fields to the response.
The filtering would work exactly the same as the pre-filter search phase which checks that:

  • Disjoint range queries on numeric field can rewrite to match none.
  • Constant keyword queries can rewrite to match none.
  • Unmapped fields can rewrite to match none.

This phase is cheap to run (no I/O) so it shouldn't be an issue to perform multiple _field_caps call on a single index pattern, one per search request.

@jimczi jimczi added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types labels May 5, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 5, 2020
@ruflin
Copy link
Contributor

ruflin commented May 6, 2020

This is great and will work really well with the new indexing strategy. It would be nice if also @timestamp could be used as a filter. This would answer the question: Give me the _field_caps only for the data I'm currently looking on. I would not expect it to provide partial results of an index but exclude all indices that do not match the timestamp at all.

@jimczi
Copy link
Contributor Author

jimczi commented May 6, 2020

It would be nice if also @timestamp could be used as a filter.

That will be handled natively with the pre-filter phase.

I would not expect it to provide partial results of an index but exclude all indices that do not match the timestamp at all.

+1, it's not clear in the description but the idea is to eliminate indices if they cannot match the filter on all shards.

@jtibshirani
Copy link
Contributor

jtibshirani commented May 6, 2020

A small suggestion: perhaps the section could be named something like index_filter instead of filter? That would help clarify that the query is used to filter out entire indices. Otherwise users might assume that they could provide any search query, and we'd only return fields that appear in some matching document (I think this is the idea of 'partial results' that @ruflin mentioned).

jimczi added a commit to jimczi/elasticsearch that referenced this issue May 28, 2020
This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:

````
GET metrics-*
{
  "index_filter": {
    "bool": {
      "must": [
        "range": {
          "@timestamp": {
            "gt": "2019"
          }
        }
      }
  }
}
````

The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.

Closes elastic#56195
@astefan
Copy link
Contributor

astefan commented May 29, 2020

Wondering if this feature would be useful for sql/eql where users have the option of providing a query DSL in the form of a filter section in the request. We would use the query in the filter section for the _field_caps call... @costin wdyt?

@costin
Copy link
Member

costin commented May 29, 2020

Indeed, adding the filter to the query should speed things up when dealing with large mappings (e.g. beats).

@wylieconlon
Copy link

This seems like an unintuitive API choice- like other commenters, I was expecting that this would filter individual fields as well. I previously described the expected API in #52730, with the semantics that others were expecting.

@jimczi
Copy link
Contributor Author

jimczi commented Jun 3, 2020

We discussed offline with @wylieconlon and agreed that this proposal has the benefit of keeping things fast on any circumstances. We also clarified that this solution will be effective once we move to the new indexing strategy in beats so the user experience should be the same as the one proposed in #52730.

jimczi added a commit that referenced this issue Jun 17, 2020
* Add index filtering in field capabilities API

This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:

````
GET metrics-*
{
  "index_filter": {
    "bool": {
      "must": [
        "range": {
          "@timestamp": {
            "gt": "2019"
          }
        }
      }
  }
}
````

The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.

Closes #56195
jimczi added a commit that referenced this issue Jun 18, 2020
This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:

````
GET metrics-*
{
  "index_filter": {
    "bool": {
      "must": [
        "range": {
          "@timestamp": {
            "gt": "2019"
          }
        }
      }
  }
}
````

The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.

Closes #56195
@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants