-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
composite and terms aggregations behave differently on ip
fields
#50600
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
The bug is that that rendering is full of bizzare characters, right? Just making sure. |
@nik9000 I think there are probably two bugs:
|
This doesn't reproduce for me in master and 7.x. I'm checking 7.5 now. |
And I can't reproduce it on 7.5.1 either. Here is what I ran. I can reproduce the composite aggs showing raw bytes though. I'll dig into that one. |
The composite agg reduces itself by picking the formatter from the "first" index. That seems wrong. I'm checking on what the terms agg does. |
The terms aggregation doesn't have this problem because it converts IPs to string on the bucket side. Composite aggregations aren't designed like that so they'll want a different solution to the problem. |
So the composite agg is designed around every the fields in the composite all being the same type. When they aren't it trips assertions, or, if those aren't enabled, it behaves very oddly. I think our options are:
Option 2 seem like a bigger lift. I think maybe option 1 is more kind in the short term with option two being something we can talk about later. I'd like to mention that the terms aggregation doesn't have this problem quite as bad because the conversion from ip to string is done on the data nodes instead of the coordinating nodes. I expect we don't want to do this because strings won't have the same ordering as IPs and composite aggs need reductions to order "at least" as well as we order on the data node. |
We can certainly implement either of the solutions I described above so if the field isn't mapped in some of the indices then everything "just works" which should fix the issues @benwtrent described above. |
I wonder how hard it'd be to add the formatter to the bucket instead of leaving it global to the aggregation. That'd be a chunk of how we'd add support for different field types and it'd fix the errors above at the cost of a couple more bytes across the wire. Which sounds ok with me. |
Another interesting thing - if we wanted to support polymorphic types for the terms in the composite we could use the type information from the formatter and annotate the buckets with that type information. We could also add that type information to the sorting. That means you could get two buckets with the same key but the types would at least be different. Those two buckets likely wouldn't be on the same "page" anyway because we'd do the coarsest sort on type. |
Just as a note, I think Terms does the wrong thing here. Converting from IP to String creates a bunch of edge cases, since a given IP may map to multiple strings (e.g. IPv6 optionally omitting multiple zeros). This can lead to subtle errors where two strings which should be the same IP address end up as different buckets in the aggregation. IMHO, the right thing to do is to enforce that fields must be the same type for all indexes in the aggregation. |
I have a prototype locally that produces the right answer when aggregating across two field IPs and unmapped fields which is quite simple. I think that is probably worth getting in because we do want to support that. I've played a little with trying to support polymorphic types. It'd be tricky to get it right because of "after_key", I think. |
+1 on fixing it for unmapped fields, that should definitely be something we support. If you want to PR that, I'll be happy to review it. |
That is my plan! I'm trying to reproduce a semi-related bug at the moment but should open a PR today if all goes well. |
When a composite aggregation is reduced using the results from an index that has one of the fields unmapped we were throwing away the formatter. This is mildly annoying, except in the case of IP addresses which were coming out as non-utf-8-characters. And tripping assertions. This carefully preserves the formatter from the working bucket. Closes elastic#50600
When a composite aggregation is reduced using the results from an index that has one of the fields unmapped we were throwing away the formatter. This is mildly annoying, except in the case of IP addresses which were coming out as non-utf-8-characters. And tripping assertions. This carefully preserves the formatter from the working bucket. Closes #50600
When a composite aggregation is reduced using the results from an index that has one of the fields unmapped we were throwing away the formatter. This is mildly annoying, except in the case of IP addresses which were coming out as non-utf-8-characters. And tripping assertions. This carefully preserves the formatter from the working bucket. Closes elastic#50600
When a composite aggregation is reduced using the results from an index that has one of the fields unmapped we were throwing away the formatter. This is mildly annoying, except in the case of IP addresses which were coming out as non-utf-8-characters. And tripping assertions. This carefully preserves the formatter from the working bucket. Closes #50600
When a composite aggregation is reduced using the results from an index that has one of the fields unmapped we were throwing away the formatter. This is mildly annoying, except in the case of IP addresses which were coming out as non-utf-8-characters. And tripping assertions. This carefully preserves the formatter from the working bucket. Closes elastic#50600
TL;DR
composite
aggs behave differently thanterms
agg when there is an index in the pattern that matches the prefix of the other indices that does NOT have a mapped field matching the aggregated field. They behave the same if the "prefix index" has the field, but it is indexed as the wrong type.Assume we have the following indices
Lets add a new index that does not have an
ip
fieldComposite aggs behaves as follows (as of 7.5).
But if the index that is added is instead:
The following is returned from the same composite agg (image for readability)
Note the difference with a plain
terms
aggNOTE: if the
test_index
has the following mapping, composite aggs and terms behave the SAME (showing raw bytes instead of human readable strings)(image for readability)
Related issues/PRs:
The text was updated successfully, but these errors were encountered: