-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically flatten objects when subobjects:false #97972
Automatically flatten objects when subobjects:false #97972
Conversation
Pinging @elastic/es-search (Team:Search) |
Hi @piergm, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/DocumentParserTests.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/DocumentParserTests.java
Show resolved
Hide resolved
I'm really excited that this is happening. Thanks for working on it ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another , more in-depth round, and left some comments
server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/DocumentParserTests.java
Outdated
Show resolved
Hide resolved
@elasticsearchmachine run elasticsearch-ci/packaging-tests-windows-sample |
I wonder if it's possible to make these changes without touching ContentPath, which is getting considerably more complicated here. I think we can keep all our changes within Have a look at XContentSubParser to get an idea of how this might be implemented - keep track of the nesting level, and if we're at level 1 when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks @piergm - I left a couple of nits and have one question.
libs/x-content/src/main/java/org/elasticsearch/xcontent/FlatteringXContentParser.java
Outdated
Show resolved
Hide resolved
libs/x-content/src/test/java/org/elasticsearch/xcontent/XContentParserTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all the iterations!
While ingesting documents that contain nested objects and the mapping property subobjects is set to false instead of throwing a mapping exception and dropping the document(s), we map only leaf field(s) with their full path as their name separated by dots.
assertEquals("parent.child2", secondChildName); | ||
assertEquals(XContentParser.Token.START_OBJECT, subParser.nextToken()); | ||
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken()); | ||
assertEquals("grandChild", subParser.currentName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help me understand why we need to prepend the path only if level is 1? shouldn't this be also parent.child2.grandChild
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time that we have a dynamically created object mapper and subobjects:false we create a new context with a FlatteningXContentParser. This is explicit when we create subParser
for field parent
and is not done (for tests purpouses) for the field child2
. In the normal code execution since 'child2' is a dynamically created object mapper itself we would have created a sub-subParser passing as parent name subParser.currentName()
that is parent.child2
as was done in this test and therefore the current name here would have being parent.child2.grandChild
.
Is for this reason that we only "care" about immediate child, since grandchildren are handled recursively. Otherwise we would have created somewhat of a clone of ContentPath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for your patient explanation ;)
* @throws IOException If an I/O error occurs during parsing. | ||
*/ | ||
@Override | ||
public String currentName() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit surprising that the only method that needs to be overridden here is currentName
. If we are flattening docs, shouldn't nextToken get adjusted too in some ways? I find this a little hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to distinguish between deeply nested leaf fields, and dynamically created mappers that accept an object, we only flatten a single level at a time. We don't need to adjust anything apart from the lowest-level name, because once we move into a new object we will either pass it to a mapper to parse, or wrap it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are using the logic of XContentSubParser
that I extend. There is keeping track of the current nested level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I think that we should make this parser private and enforce the expected behaviour. If we expect nextToken not to be called on it, have it throw unsupported operation exception? My intention is to reduce the blast radius / unexpected scenarios and making the parser less generic if possible. What do you both think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still expect nextToken to be called, but it delegates to the XContentSubParser implementation (which keeps track of levels, etc). But yes, +1 to making the parser private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on the private parser, but as @romseygeek said we must not throw when nextToken is called, we expect to call the parent's class nextToken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, thanks.Let's also see if there's other assumptions we are making that could be enforced directly in the parser to prevent it from being misused. Maybe nothing to do, not exactly sure.
* A subclass of XContentSubParser that provides the functionality to flatten | ||
* the field names by prefixing them with the provided parent name. | ||
*/ | ||
public class FlatteningXContentParser extends XContentSubParser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for chiming in late, I'd like to better understand how the additional complexity in ContentPath
around tracking the full path is replaced with this custom parser. In a previous experience with DotExpandingXContentParser
, we have seen that having a custom parser can have quite a few side effects that are difficult to predict (do we forward all the right methods? is the parser only used how we think it is?), because the parser exposes a pretty generic interface that is used in many different ways and we are customizing it for a very specific scenario.
While additional methods to ContentPath
increase its surface, I find that they make the contract easier to reason about and easier to track as well. While I agree that ContentPath
is not a fantastic abstraction and needs some love, I am not sure that adding a new parser addresses that fairly, in that it introduces all kinds of unknowns, at least in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the parser exposes a pretty generic interface that is used in many different ways and we are customizing it for a very specific scenario.
That's a fair point - maybe we should make this a private class within DocumentParserContext?
My reason for preferring it this way is that is makes it much simpler to read the DocumentParser code, which is already fairly gnarly. It also opens the way to further simplifications - I played around a bit with rewriting the Dynamic.RUNTIME logic to use this and it mostly just works, although we'd need to revisit how we handle fields that contain spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that DocumentParser could be simplified, greatly. But I think that this type of changes just end up moving the complexity elsewhere and possibly cause collateral issues that are difficult to predict. I am being conservative, I know, that's because I've been bitten with DotExpandingXContentParser before.
@@ -486,15 +499,16 @@ private static void parseObjectDynamic(DocumentParserContext context, String cur | |||
); | |||
} | |||
if (dynamicObjectMapper instanceof ObjectMapper) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: here we don't know how the object mapper was created dynamically. In most cases this comes from the usual dynamic mappings (intermediate objects created automatically), but are there cases in which an object may be the result of applying a dynamic template?
The main purpose of this conditional is that we do want to apply dynamic templates that e.g. create ip fields out of objects from docs, and we know implicitly that these can only come from dynamic template because we have no default dynamic mapping for ip fields (or any field that supports parsing objects natively for that matter). I think though that we are also making the assumption that objects are only the result of the default dynamic mappings, while they could be the result of applying a dynamic template. I am not entirely sure what this would cause: such objects could have a different dynamic or enabled property which we are ignoring. I am thinking we may want to consider throwing exception if we get an object mapper as a result of applying a dynamic template. Ideally, we would apply only the dynamic template here so we don't have to guess what happened.
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this out and you are correct there could be the case in which an object may be result of applying dynamic templates.
I do agree on the solution of throwing an exception in this case.
I'll create a PR with all the suggested improvements.
Thanks for the review!
While ingesting documents that contain nested object and the
mapping property
subobjects
is set tofalse
instead of throwinga mapping exception and dropping the document(s), we map only
leaf field(s) with their full path as their name.
This has being done by avoiding dynamically mapping of the
intermediate objects and retaining
dottedFieldName
while parsing.After this change with the following root level mapping:
{ "subobjects" : false, "properties" : { } }
the ingest both of the following documents will result in the same
mapped fields (
time.min
andtime.max
) instead of having thesecond rejected:
{ "time.min": 1, "time.max": 2 }
{ "time": { "max": 1, "max": 2 } }
Subtasks:
object is encountered
be dynamically mapped exclusively through dynamic templates.
closes #88934