-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add response filtering with filter_path
parameter
#10980
API: Add response filtering with filter_path
parameter
#10980
Conversation
Note:
|
We should return an empty JSON object rather than an empty string, otherwise the client needs to check whether JSON has been returned. |
@clintongormley thanks, I updated the code to reflect your last comment. |
@spinscale Can you please have a look? Thanks |
|
||
@Override | ||
public XContentGenerator createGenerator(OutputStream os, String[] filters) throws IOException { | ||
if ((filters == null) || (filters.length == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use CollectionUtils.isEmpty()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
A couple of things First, I really would like to see some simple benchmarking, so we have some ballpark numbers, how this affects performance. Especially because Minor thing: One other thing, I know that jackson has some filtering capabiltiies built-in, I have no ideas if we can make use of them though, see here, here and here |
That's the first thing we checked before starting working on this pull request. The current filtering capabilities of Jackson (version <2.5.1) can be used only with jackson-databinding lib and apply only to POJOs but not when you use jackson to generate JSON in streaming mode. Jackson 2.6 will integrate an interesting filtering feature that will work with JSON streaming (see release note here and test class here). It currently uses JSONPointer to filter properties but I think we will be able to implement our own TokenFilter later, once version 2.6 is released. |
I added the There are no real surprise in the benchmark:
There are room for improvements like this one (I'm also thinking of not creating FilterContext for sub fields when a parent field matches the end of a filter), but it might add complexity to the code and I'm not sure if it will be really more efficient. I'm currently trying some of these improvement to see if they are pertinent or not. |
Thinking of improvements again, we should be able to reuse the objects instead of creating new ones for every sub fields. I'll try to improve that. |
I assume that the code path (and performance) remains the same as today if no |
@clintongormley yes. Branching is done here where we fall back to the normal XContent generator. |
private List<XContentFilter> matchings; | ||
|
||
/** | ||
* Flag to indicate if the field/property must be write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... written
added some absolutely minor comments... First, this doesnt compile using mvn, due to the benchmark class and some locale, but not really an issue. Few last things:
Maybe we should also note somewhere to switch to the jackson filter features with the 2.6 release, I'd hate to forget that... LGTM apart from that |
I'm late to the party here, but I have a simple question. Why is the parameter |
@rjernst only because it works well with the existing |
Elsewhere we use |
actually, since this is a "rest" level parameter, and for those, we have |
+1 on |
here is another suggestion, calling it The reason I am raising it is that today, we support source level include/exclude when fetching source documents. With this infrastructure, if we eventually expand it to support include_s_/exclude_s_, we can also support zero copy (as in, not create a map of maps/lists representation of source) when someone asks for source includes/includes (we can just create a bytes based filtering generator, and copy structure from the parser right into it). I helped a fellow on IRC today where the loading and parsing of source include was the perf bottleneck (from 1ms it got up to 170ms for ~1000 docs). I think this will help a lot there eventually. |
@spinscale I rebased and updated the code according to your last comments. Can you have a look and do some manual testing please? Thanks a lot :) Note: latest benchmark numbers are here. |
we may also not want to have manual testing but also add that parameter (plus tests) to our REST tests |
@spinscale I just added the REST tests. The parameter is still named |
@@ -56,6 +56,10 @@ | |||
"options" : ["node", "indices", "shards"], | |||
"default" : "node" | |||
}, | |||
"_path": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the _path
needs to be added to all the methods supporting it, to make sure the clients are aware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, like pretty and _source
, we can just support this parameter everywhere, so don't add it to the rest spec.
left one comment, apart from that it feels like we are close to getting this in, when we get the naming stuff resolved... /cc @clintongormley |
The |
@clintongormley sounds good to me, thanks |
@spinscale Updated with your last comment :) |
@spinscale @kimchy does this need another review, or are we good to go? |
LGTM on my side |
This change adds a new "filter_path" parameter that can be used to filter and reduce the responses returned by the REST API of elasticsearch. For example, returning only the shards that failed to be optimized: ``` curl -XPOST 'localhost:9200/beer/_optimize?filter_path=_shards.failed' {"_shards":{"failed":0}}% ``` It supports multiple filters (separated by a comma): ``` curl -XGET 'localhost:9200/_mapping?pretty&filter_path=*.mappings.*.properties.name,*.mappings.*.properties.title' ``` It also supports the YAML response format. Here it returns only the `_id` field of a newly indexed document: ``` curl -XPOST 'localhost:9200/library/book?filter_path=_id' -d '---hello:\n world: 1\n' --- _id: "AU0j64-b-stVfkvus5-A" ``` It also supports wildcards. Here it returns only the host name of every nodes in the cluster: ``` curl -XGET 'http://localhost:9200/_nodes/stats?filter_path=nodes.*.host*' {"nodes":{"lvJHed8uQQu4brS-SXKsNA":{"host":"portable"}}} ``` And "**" can be used to include sub fields without knowing the exact path. Here it returns only the Lucene version of every segment: ``` curl 'http://localhost:9200/_segments?pretty&filter_path=indices.**.version' { "indices" : { "beer" : { "shards" : { "0" : [ { "segments" : { "_0" : { "version" : "5.2.0" }, "_1" : { "version" : "5.2.0" } } } ] } } } } ``` Note that elasticsearch sometimes returns directly the raw value of a field, like the _source field. If you want to filter _source fields, you should consider combining the already existing _source parameter (see Get API for more details) with the filter_path parameter like this: ``` curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title' { "hits" : { "hits" : [ { "_source":{"title":"Book elastic#2"} }, { "_source":{"title":"Book elastic#1"} }, { "_source":{"title":"Book elastic#3"} } ] } } ```
234c3b6
to
ce63590
Compare
w00t |
filter_path
parameter
WIN Thanks so much everyone for making this happen. This is huge for Kibana! |
Is this already supported via the Java API? |
@dkroehan The If you need to render |
are excludes supported yet, or just includes? |
@nmors Using the REST API and the |
Thanks, I'm actually using the javascript client. I'm using the filterPath parameter like so (I guess it's a wrapper for filter_path), all is working fine! It would be good to be able to use excludes though, can be very helpful for me as it is web traffic that I would like to keep small. I've managed to cut the response size in half by using the following:
|
This change adds a new "_path" parameter that can be used to filter and reduce the responses returned by the REST API of elasticsearch.
For example, returning only the shards that failed to be optimized:
It supports multiple filters (separated by a comma):
It also supports the YAML response format. Here it returns only the
_id
field of a newly indexed document:It also supports wildcards. Here it returns only the host name of every nodes in the cluster:
And "**" can be used to include sub fields without knowing the exact path. Here it returns only the Lucene version of every segment:
Note that elasticsearch sometimes returns directly the raw value of a field, like the "_source" field. If you want to filter the response that include _source fields like in Search or Get responses , you should consider using the already existing "fields" parameter:
The "_path" parameter can be used to further reduce the result:
Closes #7401