-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically flatten objects when subobjects:false #97972
Changes from all commits
08ae2f4
1f03367
663dfa9
fdefcf2
5773464
99bb79e
b6bd7be
8bf797d
dbc8872
7cbaa8e
50698c1
6cd5af2
21c2be5
ca624b1
c4f6e3c
c8b5cc5
c65b593
2f47686
f4d725f
418e67c
ee633ec
31fc136
f81b356
359cefa
73c2d9c
f350af4
f34d7f6
2ddfd45
063a46c
11b9324
d140c24
eb24547
a76a5d4
4bee7f5
dba4a9b
f59f2c7
b5ebfca
2ff0de4
b69502e
0edae22
e571db6
3c623db
2e483f3
0560da8
6a5b9e2
060dd7f
a400ea4
6c7a445
4e62a65
3ed03b2
765b6f1
cfab9e2
df992df
a6c8c6b
c145e7f
86736cf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pr: 97972 | ||
summary: Automatically flatten objects when subobjects:false | ||
area: Mapping | ||
type: enhancement | ||
issues: | ||
- 88934 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0 and the Server Side Public License, v 1; you may not use this file except | ||
* in compliance with, at your election, the Elastic License 2.0 or the Server | ||
* Side Public License, v 1. | ||
*/ | ||
|
||
package org.elasticsearch.xcontent; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* A subclass of XContentSubParser that provides the functionality to flatten | ||
* the field names by prefixing them with the provided parent name. | ||
*/ | ||
public class FlatteningXContentParser extends XContentSubParser { | ||
private final String parentName; | ||
private static final char DELIMITER = '.'; | ||
|
||
/** | ||
* Constructs a FlatteningXContentParser with the given parent name and wraps an existing XContentParser. | ||
* | ||
* @param parser The XContentParser to be wrapped and extended with flattening functionality. | ||
* @param parentName The parent name to be used as a prefix for immediate children. | ||
*/ | ||
public FlatteningXContentParser(XContentParser parser, String parentName) { | ||
super(parser); | ||
this.parentName = parentName; | ||
} | ||
|
||
/** | ||
* Retrieves the name of the current field being parsed. If the current parsing level is 1, | ||
* the returned field name will be constructed by prepending the parent name to the | ||
* delegate's currentFieldName, otherwise just delegate. | ||
* | ||
* @return The current field name, potentially modified by prepending the parent name as a prefix. | ||
* @throws IOException If an I/O error occurs during parsing. | ||
*/ | ||
@Override | ||
public String currentName() throws IOException { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find it a bit surprising that the only method that needs to be overridden here is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because we need to distinguish between deeply nested leaf fields, and dynamically created mappers that accept an object, we only flatten a single level at a time. We don't need to adjust anything apart from the lowest-level name, because once we move into a new object we will either pass it to a mapper to parse, or wrap it again. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we are using the logic of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it, I think that we should make this parser private and enforce the expected behaviour. If we expect nextToken not to be called on it, have it throw unsupported operation exception? My intention is to reduce the blast radius / unexpected scenarios and making the parser less generic if possible. What do you both think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We still expect nextToken to be called, but it delegates to the XContentSubParser implementation (which keeps track of levels, etc). But yes, +1 to making the parser private. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree on the private parser, but as @romseygeek said we must not throw when nextToken is called, we expect to call the parent's class nextToken. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds good, thanks.Let's also see if there's other assumptions we are making that could be enforced directly in the parser to prevent it from being misused. Maybe nothing to do, not exactly sure. |
||
if (level() == 1) { | ||
return new StringBuilder(parentName).append(DELIMITER).append(delegate().currentName()).toString(); | ||
} | ||
return delegate().currentName(); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -77,4 +77,8 @@ public void close() throws IOException { | |
} | ||
} | ||
} | ||
|
||
int level() { | ||
return level; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -475,6 +475,41 @@ public void testSubParserObject() throws IOException { | |
} | ||
} | ||
|
||
public void testFlatteningParserObject() throws IOException { | ||
String content = """ | ||
{ | ||
"parent": { | ||
"child1" : 1, | ||
"child2": { | ||
"grandChild" : 1 | ||
}, | ||
"child3" : 1 | ||
} | ||
} | ||
"""; | ||
XContentParser parser = createParser(JsonXContent.jsonXContent, content); | ||
assertEquals(XContentParser.Token.START_OBJECT, parser.nextToken()); | ||
assertEquals(XContentParser.Token.FIELD_NAME, parser.nextToken()); | ||
assertEquals("parent", parser.currentName()); | ||
assertEquals(XContentParser.Token.START_OBJECT, parser.nextToken()); | ||
XContentParser subParser = new FlatteningXContentParser(parser, parser.currentName()); | ||
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken()); | ||
assertEquals("parent.child1", subParser.currentName()); | ||
assertEquals(XContentParser.Token.VALUE_NUMBER, subParser.nextToken()); | ||
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken()); | ||
String secondChildName = subParser.currentName(); | ||
assertEquals("parent.child2", secondChildName); | ||
assertEquals(XContentParser.Token.START_OBJECT, subParser.nextToken()); | ||
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken()); | ||
assertEquals("grandChild", subParser.currentName()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you help me understand why we need to prepend the path only if level is 1? shouldn't this be also There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Every time that we have a dynamically created object mapper and subobjects:false we create a new context with a FlatteningXContentParser. This is explicit when we create There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it, thanks for your patient explanation ;) |
||
assertEquals(XContentParser.Token.VALUE_NUMBER, subParser.nextToken()); | ||
assertEquals(XContentParser.Token.END_OBJECT, subParser.nextToken()); | ||
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken()); | ||
assertEquals("parent.child3", subParser.currentName()); | ||
assertEquals(XContentParser.Token.VALUE_NUMBER, subParser.nextToken()); | ||
|
||
} | ||
|
||
public void testSubParserArray() throws IOException { | ||
XContentBuilder builder = XContentFactory.jsonBuilder(); | ||
int numberOfArrayElements = randomInt(10); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -396,7 +396,15 @@ static void parseObjectOrField(DocumentParserContext context, Mapper mapper) thr | |
context = context.createChildContext(objectMapper); | ||
parseObjectOrNested(context); | ||
} else if (mapper instanceof FieldMapper fieldMapper) { | ||
fieldMapper.parse(context); | ||
if (shouldFlattenObject(context, fieldMapper)) { | ||
// we pass the mapper's simpleName as parentName to the new DocumentParserContext | ||
String currentFieldName = fieldMapper.simpleName(); | ||
context.path().remove(); | ||
parseObjectOrNested(context.createFlattenContext(currentFieldName)); | ||
context.path().add(currentFieldName); | ||
} else { | ||
fieldMapper.parse(context); | ||
} | ||
if (context.isWithinCopyTo() == false) { | ||
List<String> copyToFields = fieldMapper.copyTo().copyToFields(); | ||
if (copyToFields.isEmpty() == false) { | ||
|
@@ -415,6 +423,12 @@ static void parseObjectOrField(DocumentParserContext context, Mapper mapper) thr | |
} | ||
} | ||
|
||
private static boolean shouldFlattenObject(DocumentParserContext context, FieldMapper fieldMapper) { | ||
return context.parser().currentToken() == XContentParser.Token.START_OBJECT | ||
&& context.parent().subobjects() == false | ||
&& fieldMapper.supportsParsingObject() == false; | ||
} | ||
|
||
private static void throwOnUnrecognizedMapperType(Mapper mapper) { | ||
throw new IllegalStateException( | ||
"The provided mapper [" + mapper.name() + "] has an unrecognized type [" + mapper.getClass().getSimpleName() + "]." | ||
|
@@ -472,7 +486,6 @@ private static void parseObjectDynamic(DocumentParserContext context, String cur | |
dynamicObjectMapper = new NoOpObjectMapper(currentFieldName, context.path().pathAsText(currentFieldName)); | ||
} else { | ||
dynamicObjectMapper = DynamicFieldsBuilder.createDynamicObjectMapper(context, currentFieldName); | ||
context.addDynamicMapper(dynamicObjectMapper); | ||
} | ||
if (context.parent().subobjects() == false) { | ||
if (dynamicObjectMapper instanceof NestedObjectMapper) { | ||
|
@@ -486,15 +499,16 @@ private static void parseObjectDynamic(DocumentParserContext context, String cur | |
); | ||
} | ||
if (dynamicObjectMapper instanceof ObjectMapper) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: here we don't know how the object mapper was created dynamically. In most cases this comes from the usual dynamic mappings (intermediate objects created automatically), but are there cases in which an object may be the result of applying a dynamic template? The main purpose of this conditional is that we do want to apply dynamic templates that e.g. create ip fields out of objects from docs, and we know implicitly that these can only come from dynamic template because we have no default dynamic mapping for ip fields (or any field that supports parsing objects natively for that matter). I think though that we are also making the assumption that objects are only the result of the default dynamic mappings, while they could be the result of applying a dynamic template. I am not entirely sure what this would cause: such objects could have a different dynamic or enabled property which we are ignoring. I am thinking we may want to consider throwing exception if we get an object mapper as a result of applying a dynamic template. Ideally, we would apply only the dynamic template here so we don't have to guess what happened. Does this make sense? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tested this out and you are correct there could be the case in which an object may be result of applying dynamic templates. |
||
throw new DocumentParsingException( | ||
context.parser().getTokenLocation(), | ||
"Tried to add subobject [" | ||
+ dynamicObjectMapper.simpleName() | ||
+ "] to object [" | ||
+ context.parent().name() | ||
+ "] which does not support subobjects" | ||
); | ||
// We have an ObjectMapper but subobjects are disallowed | ||
// therefore we create a new DocumentParserContext that | ||
// prepends currentFieldName to any immediate children. | ||
parseObjectOrNested(context.createFlattenContext(currentFieldName)); | ||
return; | ||
} | ||
|
||
} | ||
if (context.dynamic() != ObjectMapper.Dynamic.RUNTIME) { | ||
javanna marked this conversation as resolved.
Show resolved
Hide resolved
|
||
context.addDynamicMapper(dynamicObjectMapper); | ||
} | ||
javanna marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (dynamicObjectMapper instanceof NestedObjectMapper && context.isWithinCopyTo()) { | ||
throwOnCreateDynamicNestedViaCopyTo(dynamicObjectMapper, context); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for chiming in late, I'd like to better understand how the additional complexity in
ContentPath
around tracking the full path is replaced with this custom parser. In a previous experience withDotExpandingXContentParser
, we have seen that having a custom parser can have quite a few side effects that are difficult to predict (do we forward all the right methods? is the parser only used how we think it is?), because the parser exposes a pretty generic interface that is used in many different ways and we are customizing it for a very specific scenario.While additional methods to
ContentPath
increase its surface, I find that they make the contract easier to reason about and easier to track as well. While I agree thatContentPath
is not a fantastic abstraction and needs some love, I am not sure that adding a new parser addresses that fairly, in that it introduces all kinds of unknowns, at least in my opinion.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point - maybe we should make this a private class within DocumentParserContext?
My reason for preferring it this way is that is makes it much simpler to read the DocumentParser code, which is already fairly gnarly. It also opens the way to further simplifications - I played around a bit with rewriting the Dynamic.RUNTIME logic to use this and it mostly just works, although we'd need to revisit how we handle fields that contain spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that DocumentParser could be simplified, greatly. But I think that this type of changes just end up moving the complexity elsewhere and possibly cause collateral issues that are difficult to predict. I am being conservative, I know, that's because I've been bitten with DotExpandingXContentParser before.