Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically flatten objects when subobjects:false #97972

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
08ae2f4
test plus first implementation
piergm Jul 14, 2023
1f03367
improved implementation, plus added more tests
piergm Jul 19, 2023
663dfa9
logic for dotted field names when subobject: false
piergm Jul 20, 2023
fdefcf2
aligned existing test to new logic
piergm Jul 24, 2023
5773464
working dynamic templates
piergm Jul 25, 2023
99bb79e
code lint
piergm Jul 25, 2023
b6bd7be
Merge branch 'elastic:main' into automatically-flatten-objects-when-s…
piergm Jul 25, 2023
8bf797d
added comprehensive tests for the functionality
piergm Jul 26, 2023
dbc8872
added support for array with mixed content
piergm Jul 26, 2023
7cbaa8e
code lint and more tests checks
piergm Jul 26, 2023
50698c1
Update docs/changelog/97972.yaml
piergm Jul 26, 2023
6cd5af2
Merge branch 'elastic:main' into automatically-flatten-objects-when-s…
piergm Jul 27, 2023
21c2be5
re-introduced assertion as per PR comment
piergm Jul 27, 2023
ca624b1
added test that indexes multiple docs with different way to specify t…
piergm Jul 27, 2023
c4f6e3c
re-introduced removed checks for nestedObjects and ObjectMapper if su…
piergm Jul 27, 2023
c8b5cc5
re-introduced removed flag for WithinLeafObject
piergm Jul 28, 2023
c65b593
added test for mapped field that can parse objects nativelly
piergm Jul 28, 2023
2f47686
corrected test that ingest multiple docs with different way to specif…
piergm Jul 28, 2023
f4d725f
Merge branch 'elastic:main' into automatically-flatten-objects-when-s…
piergm Jul 28, 2023
418e67c
added method to identify if a FieldMapper can parse object
piergm Aug 8, 2023
ee633ec
wrong behaviour on removing path and now we return the removed path
piergm Aug 8, 2023
31fc136
added further tests
piergm Aug 8, 2023
f81b356
clean code
piergm Aug 8, 2023
359cefa
added test for corner-case
piergm Aug 8, 2023
73c2d9c
implementation to manage corner-cases
piergm Aug 8, 2023
f350af4
Merge branch 'elastic:main' into automatically-flatten-objects-when-s…
piergm Aug 8, 2023
f34d7f6
added more fields that can parse objects
piergm Aug 9, 2023
2ddfd45
type checks on fields
piergm Aug 9, 2023
063a46c
added comment to clarify code
piergm Aug 9, 2023
11b9324
added comment to clarify code
piergm Aug 9, 2023
d140c24
check lookup dynamic template only if dynamic:true
piergm Aug 9, 2023
eb24547
renamed variable
piergm Aug 9, 2023
a76a5d4
Merge branch 'elastic:main' into automatically-flatten-objects-when-s…
piergm Aug 9, 2023
4bee7f5
small changes due to PR comments
piergm Aug 21, 2023
dba4a9b
renamed method
piergm Aug 21, 2023
f59f2c7
removed method from deprecated class
piergm Aug 21, 2023
b5ebfca
merged main, resolved conflicts
piergm Aug 22, 2023
2ff0de4
added dottedFieldName tests
piergm Aug 22, 2023
b69502e
merged main, resolved conflicts
piergm Aug 22, 2023
0edae22
added JavaDocs for ContentPath
piergm Aug 22, 2023
e571db6
initial implementation after PR comments
piergm Aug 24, 2023
3c623db
all tests passing
piergm Aug 24, 2023
2e483f3
cleaned code
piergm Aug 24, 2023
0560da8
removed comments
piergm Aug 24, 2023
6a5b9e2
reverted changes
piergm Aug 24, 2023
060dd7f
avoid using deprecated methods
piergm Aug 24, 2023
a400ea4
added comments
piergm Aug 24, 2023
6c7a445
simplified ContentPath
piergm Aug 24, 2023
4e62a65
further semplifications
piergm Aug 24, 2023
3ed03b2
JavaDocs + Tests
piergm Aug 24, 2023
765b6f1
Merge branch 'elastic:main' into automatically-flatten-objects-when-s…
piergm Aug 24, 2023
cfab9e2
typos + small change in test
piergm Aug 24, 2023
df992df
renamed class
piergm Aug 24, 2023
a6c8c6b
enhanced test
piergm Aug 24, 2023
c145e7f
small refactor in test
piergm Aug 24, 2023
86736cf
renamed method
piergm Aug 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/97972.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 97972
summary: Automatically flatten objects when subobjects:false
area: Mapping
type: enhancement
issues:
- 88934
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

package org.elasticsearch.xcontent;

import java.io.IOException;

/**
* A subclass of XContentSubParser that provides the functionality to flatten
* the field names by prefixing them with the provided parent name.
*/
public class FlatteningXContentParser extends XContentSubParser {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for chiming in late, I'd like to better understand how the additional complexity in ContentPath around tracking the full path is replaced with this custom parser. In a previous experience with DotExpandingXContentParser, we have seen that having a custom parser can have quite a few side effects that are difficult to predict (do we forward all the right methods? is the parser only used how we think it is?), because the parser exposes a pretty generic interface that is used in many different ways and we are customizing it for a very specific scenario.

While additional methods to ContentPath increase its surface, I find that they make the contract easier to reason about and easier to track as well. While I agree that ContentPath is not a fantastic abstraction and needs some love, I am not sure that adding a new parser addresses that fairly, in that it introduces all kinds of unknowns, at least in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the parser exposes a pretty generic interface that is used in many different ways and we are customizing it for a very specific scenario.

That's a fair point - maybe we should make this a private class within DocumentParserContext?

My reason for preferring it this way is that is makes it much simpler to read the DocumentParser code, which is already fairly gnarly. It also opens the way to further simplifications - I played around a bit with rewriting the Dynamic.RUNTIME logic to use this and it mostly just works, although we'd need to revisit how we handle fields that contain spaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that DocumentParser could be simplified, greatly. But I think that this type of changes just end up moving the complexity elsewhere and possibly cause collateral issues that are difficult to predict. I am being conservative, I know, that's because I've been bitten with DotExpandingXContentParser before.

private final String parentName;
private static final char DELIMITER = '.';

/**
* Constructs a FlatteningXContentParser with the given parent name and wraps an existing XContentParser.
*
* @param parser The XContentParser to be wrapped and extended with flattening functionality.
* @param parentName The parent name to be used as a prefix for immediate children.
*/
public FlatteningXContentParser(XContentParser parser, String parentName) {
super(parser);
this.parentName = parentName;
}

/**
* Retrieves the name of the current field being parsed. If the current parsing level is 1,
* the returned field name will be constructed by prepending the parent name to the
* delegate's currentFieldName, otherwise just delegate.
*
* @return The current field name, potentially modified by prepending the parent name as a prefix.
* @throws IOException If an I/O error occurs during parsing.
*/
@Override
public String currentName() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit surprising that the only method that needs to be overridden here is currentName. If we are flattening docs, shouldn't nextToken get adjusted too in some ways? I find this a little hard to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to distinguish between deeply nested leaf fields, and dynamically created mappers that accept an object, we only flatten a single level at a time. We don't need to adjust anything apart from the lowest-level name, because once we move into a new object we will either pass it to a mapper to parse, or wrap it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are using the logic of XContentSubParser that I extend. There is keeping track of the current nested level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I think that we should make this parser private and enforce the expected behaviour. If we expect nextToken not to be called on it, have it throw unsupported operation exception? My intention is to reduce the blast radius / unexpected scenarios and making the parser less generic if possible. What do you both think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still expect nextToken to be called, but it delegates to the XContentSubParser implementation (which keeps track of levels, etc). But yes, +1 to making the parser private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the private parser, but as @romseygeek said we must not throw when nextToken is called, we expect to call the parent's class nextToken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, thanks.Let's also see if there's other assumptions we are making that could be enforced directly in the parser to prevent it from being misused. Maybe nothing to do, not exactly sure.

if (level() == 1) {
return new StringBuilder(parentName).append(DELIMITER).append(delegate().currentName()).toString();
}
return delegate().currentName();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -77,4 +77,8 @@ public void close() throws IOException {
}
}
}

int level() {
return level;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,41 @@ public void testSubParserObject() throws IOException {
}
}

public void testFlatteningParserObject() throws IOException {
String content = """
{
"parent": {
"child1" : 1,
"child2": {
"grandChild" : 1
},
"child3" : 1
}
}
""";
XContentParser parser = createParser(JsonXContent.jsonXContent, content);
assertEquals(XContentParser.Token.START_OBJECT, parser.nextToken());
assertEquals(XContentParser.Token.FIELD_NAME, parser.nextToken());
assertEquals("parent", parser.currentName());
assertEquals(XContentParser.Token.START_OBJECT, parser.nextToken());
XContentParser subParser = new FlatteningXContentParser(parser, parser.currentName());
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken());
assertEquals("parent.child1", subParser.currentName());
assertEquals(XContentParser.Token.VALUE_NUMBER, subParser.nextToken());
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken());
String secondChildName = subParser.currentName();
assertEquals("parent.child2", secondChildName);
assertEquals(XContentParser.Token.START_OBJECT, subParser.nextToken());
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken());
assertEquals("grandChild", subParser.currentName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand why we need to prepend the path only if level is 1? shouldn't this be also parent.child2.grandChild ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time that we have a dynamically created object mapper and subobjects:false we create a new context with a FlatteningXContentParser. This is explicit when we create subParser for field parent and is not done (for tests purpouses) for the field child2. In the normal code execution since 'child2' is a dynamically created object mapper itself we would have created a sub-subParser passing as parent name subParser.currentName() that is parent.child2 as was done in this test and therefore the current name here would have being parent.child2.grandChild.
Is for this reason that we only "care" about immediate child, since grandchildren are handled recursively. Otherwise we would have created somewhat of a clone of ContentPath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for your patient explanation ;)

assertEquals(XContentParser.Token.VALUE_NUMBER, subParser.nextToken());
assertEquals(XContentParser.Token.END_OBJECT, subParser.nextToken());
assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken());
assertEquals("parent.child3", subParser.currentName());
assertEquals(XContentParser.Token.VALUE_NUMBER, subParser.nextToken());

}

public void testSubParserArray() throws IOException {
XContentBuilder builder = XContentFactory.jsonBuilder();
int numberOfArrayElements = randomInt(10);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,15 @@ static void parseObjectOrField(DocumentParserContext context, Mapper mapper) thr
context = context.createChildContext(objectMapper);
parseObjectOrNested(context);
} else if (mapper instanceof FieldMapper fieldMapper) {
fieldMapper.parse(context);
if (shouldFlattenObject(context, fieldMapper)) {
// we pass the mapper's simpleName as parentName to the new DocumentParserContext
String currentFieldName = fieldMapper.simpleName();
context.path().remove();
parseObjectOrNested(context.createFlattenContext(currentFieldName));
context.path().add(currentFieldName);
} else {
fieldMapper.parse(context);
}
if (context.isWithinCopyTo() == false) {
List<String> copyToFields = fieldMapper.copyTo().copyToFields();
if (copyToFields.isEmpty() == false) {
Expand All @@ -415,6 +423,12 @@ static void parseObjectOrField(DocumentParserContext context, Mapper mapper) thr
}
}

private static boolean shouldFlattenObject(DocumentParserContext context, FieldMapper fieldMapper) {
return context.parser().currentToken() == XContentParser.Token.START_OBJECT
&& context.parent().subobjects() == false
&& fieldMapper.supportsParsingObject() == false;
}

private static void throwOnUnrecognizedMapperType(Mapper mapper) {
throw new IllegalStateException(
"The provided mapper [" + mapper.name() + "] has an unrecognized type [" + mapper.getClass().getSimpleName() + "]."
Expand Down Expand Up @@ -472,7 +486,6 @@ private static void parseObjectDynamic(DocumentParserContext context, String cur
dynamicObjectMapper = new NoOpObjectMapper(currentFieldName, context.path().pathAsText(currentFieldName));
} else {
dynamicObjectMapper = DynamicFieldsBuilder.createDynamicObjectMapper(context, currentFieldName);
context.addDynamicMapper(dynamicObjectMapper);
}
if (context.parent().subobjects() == false) {
if (dynamicObjectMapper instanceof NestedObjectMapper) {
Expand All @@ -486,15 +499,16 @@ private static void parseObjectDynamic(DocumentParserContext context, String cur
);
}
if (dynamicObjectMapper instanceof ObjectMapper) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: here we don't know how the object mapper was created dynamically. In most cases this comes from the usual dynamic mappings (intermediate objects created automatically), but are there cases in which an object may be the result of applying a dynamic template?

The main purpose of this conditional is that we do want to apply dynamic templates that e.g. create ip fields out of objects from docs, and we know implicitly that these can only come from dynamic template because we have no default dynamic mapping for ip fields (or any field that supports parsing objects natively for that matter). I think though that we are also making the assumption that objects are only the result of the default dynamic mappings, while they could be the result of applying a dynamic template. I am not entirely sure what this would cause: such objects could have a different dynamic or enabled property which we are ignoring. I am thinking we may want to consider throwing exception if we get an object mapper as a result of applying a dynamic template. Ideally, we would apply only the dynamic template here so we don't have to guess what happened.

Does this make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out and you are correct there could be the case in which an object may be result of applying dynamic templates.
I do agree on the solution of throwing an exception in this case.
I'll create a PR with all the suggested improvements.
Thanks for the review!

throw new DocumentParsingException(
context.parser().getTokenLocation(),
"Tried to add subobject ["
+ dynamicObjectMapper.simpleName()
+ "] to object ["
+ context.parent().name()
+ "] which does not support subobjects"
);
// We have an ObjectMapper but subobjects are disallowed
// therefore we create a new DocumentParserContext that
// prepends currentFieldName to any immediate children.
parseObjectOrNested(context.createFlattenContext(currentFieldName));
return;
}

}
if (context.dynamic() != ObjectMapper.Dynamic.RUNTIME) {
javanna marked this conversation as resolved.
Show resolved Hide resolved
context.addDynamicMapper(dynamicObjectMapper);
}
javanna marked this conversation as resolved.
Show resolved Hide resolved
if (dynamicObjectMapper instanceof NestedObjectMapper && context.isWithinCopyTo()) {
throwOnCreateDynamicNestedViaCopyTo(dynamicObjectMapper, context);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.analysis.IndexAnalyzers;
import org.elasticsearch.xcontent.FilterXContentParserWrapper;
import org.elasticsearch.xcontent.FlatteningXContentParser;
import org.elasticsearch.xcontent.XContentParser;

import java.io.IOException;
Expand Down Expand Up @@ -446,6 +447,20 @@ public LuceneDocument doc() {
};
}

/**
* Return a context for flattening subobjects
* @param fieldName the name of the field to be flattened
*/
public final DocumentParserContext createFlattenContext(String fieldName) {
XContentParser flatteningParser = new FlatteningXContentParser(parser(), fieldName);
return new Wrapper(this.parent(), this) {
@Override
public XContentParser parser() {
return flatteningParser;
}
};
}

/**
* @deprecated we are actively deprecating and removing the ability to pass
* complex objects to multifields, so try and avoid using this method
Expand Down
Loading