-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a pre-filter query phase to field capabilities API #56195
Comments
Pinging @elastic/es-search (:Search/Mapping) |
This is great and will work really well with the new indexing strategy. It would be nice if also |
That will be handled natively with the pre-filter phase.
+1, it's not clear in the description but the idea is to eliminate indices if they cannot match the filter on all shards. |
A small suggestion: perhaps the section could be named something like |
This change allows to use an `index_filter` in the field capabilities API. Indices are filtered from the response if the provided query rewrites to `match_none` on every shard: ```` GET metrics-* { "index_filter": { "bool": { "must": [ "range": { "@timestamp": { "gt": "2019" } } } } } ```` The filtering is done on a best-effort basis, it uses the can match phase to rewrite queries to `match_none` instead of fully executing the request. The first shard that can match the filter is used to create the field capabilities response for the entire index. Closes elastic#56195
Wondering if this feature would be useful for sql/eql where users have the option of providing a query DSL in the form of a |
Indeed, adding the filter to the query should speed things up when dealing with large mappings (e.g. beats). |
This seems like an unintuitive API choice- like other commenters, I was expecting that this would filter individual fields as well. I previously described the expected API in #52730, with the semantics that others were expecting. |
We discussed offline with @wylieconlon and agreed that this proposal has the benefit of keeping things fast on any circumstances. We also clarified that this solution will be effective once we move to the new indexing strategy in beats so the user experience should be the same as the one proposed in #52730. |
* Add index filtering in field capabilities API This change allows to use an `index_filter` in the field capabilities API. Indices are filtered from the response if the provided query rewrites to `match_none` on every shard: ```` GET metrics-* { "index_filter": { "bool": { "must": [ "range": { "@timestamp": { "gt": "2019" } } } } } ```` The filtering is done on a best-effort basis, it uses the can match phase to rewrite queries to `match_none` instead of fully executing the request. The first shard that can match the filter is used to create the field capabilities response for the entire index. Closes #56195
This change allows to use an `index_filter` in the field capabilities API. Indices are filtered from the response if the provided query rewrites to `match_none` on every shard: ```` GET metrics-* { "index_filter": { "bool": { "must": [ "range": { "@timestamp": { "gt": "2019" } } } } } ```` The filtering is done on a best-effort basis, it uses the can match phase to rewrite queries to `match_none` instead of fully executing the request. The first shard that can match the filter is used to create the field capabilities response for the entire index. Closes #56195
Context
The
_field_caps
API allows to retrieve the capabilities of fields among multiple indices.These capabilities can then be used to build search requests on these indices.
Today this API checks index mapping directly to extract the list of fields that exist in each index.
With the removal of types, we emphasized the need to separate indices based on their mappings. Beats for instance, now creates one index per module in order to ensure that fields defined in the mapping are dense (they are needed for every document in the index).
However, this strategy can be counter-productive for index patterns that mix multiple modules and rely on filters to select/eliminate specific modules.
For this specific use case, we added the
constant_keyword
field. The idea is to add a property that is shared by all documents in the index (e.g.: the Metricbeat module)in order to eliminate indices without checking their content when queries contain a specific list of modules to look at.
For instance the following query:
would query indices that define a constant_keyword like:
and would efficiently eliminate indices with other or no value for the module field.
This strategy should ease the usage of generic index pattern for search requests since the filtering can be done inside queries but it also emphasizes the need to handle field extraction on a per-request basis.
Proposal
When using generic index patterns, the
_field_caps
API returns the full list of fields that appear in all indices. Now thatconstant_keyword
can be used to filter indices based on their mapping, it would be useful to provide the ability to apply such filters in the_field_caps
API.So, instead of applying
_field_caps
to an index pattern only, the field capabilities could be resolved on a per-request basis in order to eliminate fields that cannot appear in search results.The API would look like this:
This call would eliminate indices that defines a
constant_keyword
namedmodule
with a value different thanapache
. We already have a special phase in search requests called the pre-filter phase that allows to check if a shard can match a query without executing the query.The idea here is to apply this phase to the
_field_caps
API if afilter
is provided. The query won't be executed, we only check if mandatory clauses can rewrite to match no docs from mapping informations and index statistics.Indices that cannot match the filter would not add their fields to the response.
The filtering would work exactly the same as the pre-filter search phase which checks that:
This phase is cheap to run (no I/O) so it shouldn't be an issue to perform multiple
_field_caps
call on a single index pattern, one per search request.The text was updated successfully, but these errors were encountered: