Automatically flatten objects when subobjects:false #97972

piergm · 2023-07-26T13:28:14Z

While ingesting documents that contain nested object and the
mapping property subobjects is set to false instead of throwing
a mapping exception and dropping the document(s), we map only
leaf field(s) with their full path as their name.

This has being done by avoiding dynamically mapping of the
intermediate objects and retaining dottedFieldName while parsing.

After this change with the following root level mapping:
{ "subobjects" : false, "properties" : { } }
the ingest both of the following documents will result in the same
mapped fields (time.min and time.max) instead of having the
second rejected:

{ "time.min": 1, "time.max": 2 }
{ "time": { "max": 1, "max": 2 } }

Subtasks:

Skip dynamically mapping objects when parsing a document and an
object is encountered
Fields capable of parsing objects natively, such as geo_point, will now
be dynamically mapped exclusively through dynamic templates.
Provide parsed objects to field mappers only when they are supported

closes #88934

…ubobjects-false

elasticsearchmachine · 2023-07-26T13:28:39Z

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine · 2023-07-26T13:28:39Z

Hi @piergm, I've created a changelog YAML for you.

…ubobjects-false

server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

server/src/test/java/org/elasticsearch/index/mapper/DocumentParserTests.java

felixbarny · 2023-07-28T06:41:19Z

I'm really excited that this is happening. Thanks for working on it ❤️

…he same field

…bobject:false

…y the same field

…ubobjects-false

javanna

Did another , more in-depth round, and left some comments

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

server/src/test/java/org/elasticsearch/index/mapper/DocumentParserTests.java

piergm · 2023-08-22T10:36:32Z

@elasticsearchmachine run elasticsearch-ci/packaging-tests-windows-sample

romseygeek · 2023-08-23T14:16:56Z

I wonder if it's possible to make these changes without touching ContentPath, which is getting considerably more complicated here. I think we can keep all our changes within DocumentParser#parseObjectDynamic, and do something like: if we have a dynamically created object mapper here but subobjects are disallowed, then throw said object mapper away and instead create a new DocumentParserContext with a wrapped XContentParser that prepends the current field name to any of its immediate children, and pass this to parseObjectOrField. I think this would also simplify the handling of the dynamic runtime case: where we currently create a fake object mapper for the context to use, we instead modify the xcontentparser to include the extra field level when it reports field names but stick with the same ObjectMapper parent.

Have a look at XContentSubParser to get an idea of how this might be implemented - keep track of the nesting level, and if we're at level 1 when currentName() is called, return the delegate name with the extra field name prepended, otherwise just delegate.

…ubobjects-false

romseygeek

This looks great, thanks @piergm - I left a couple of nits and have one question.

libs/x-content/src/main/java/org/elasticsearch/xcontent/FlatteringXContentParser.java

libs/x-content/src/test/java/org/elasticsearch/xcontent/XContentParserTests.java

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

romseygeek

LGTM, thanks for all the iterations!

While ingesting documents that contain nested objects and the mapping property subobjects is set to false instead of throwing a mapping exception and dropping the document(s), we map only leaf field(s) with their full path as their name separated by dots.

javanna · 2023-09-06T12:13:27Z

libs/x-content/src/test/java/org/elasticsearch/xcontent/XContentParserTests.java

+        assertEquals("parent.child2", secondChildName);
+        assertEquals(XContentParser.Token.START_OBJECT, subParser.nextToken());
+        assertEquals(XContentParser.Token.FIELD_NAME, subParser.nextToken());
+        assertEquals("grandChild", subParser.currentName());


can you help me understand why we need to prepend the path only if level is 1? shouldn't this be also parent.child2.grandChild ?

Every time that we have a dynamically created object mapper and subobjects:false we create a new context with a FlatteningXContentParser. This is explicit when we create subParser for field parent and is not done (for tests purpouses) for the field child2. In the normal code execution since 'child2' is a dynamically created object mapper itself we would have created a sub-subParser passing as parent name subParser.currentName() that is parent.child2 as was done in this test and therefore the current name here would have being parent.child2.grandChild.
Is for this reason that we only "care" about immediate child, since grandchildren are handled recursively. Otherwise we would have created somewhat of a clone of ContentPath.

Got it, thanks for your patient explanation ;)

javanna · 2023-09-06T12:20:35Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/FlatteningXContentParser.java

+     * @throws IOException If an I/O error occurs during parsing.
+     */
+    @Override
+    public String currentName() throws IOException {


I find it a bit surprising that the only method that needs to be overridden here is currentName. If we are flattening docs, shouldn't nextToken get adjusted too in some ways? I find this a little hard to follow.

Because we need to distinguish between deeply nested leaf fields, and dynamically created mappers that accept an object, we only flatten a single level at a time. We don't need to adjust anything apart from the lowest-level name, because once we move into a new object we will either pass it to a mapper to parse, or wrap it again.

Here we are using the logic of XContentSubParser that I extend. There is keeping track of the current nested level.

Got it, I think that we should make this parser private and enforce the expected behaviour. If we expect nextToken not to be called on it, have it throw unsupported operation exception? My intention is to reduce the blast radius / unexpected scenarios and making the parser less generic if possible. What do you both think?

We still expect nextToken to be called, but it delegates to the XContentSubParser implementation (which keeps track of levels, etc). But yes, +1 to making the parser private.

I agree on the private parser, but as @romseygeek said we must not throw when nextToken is called, we expect to call the parent's class nextToken.

sounds good, thanks.Let's also see if there's other assumptions we are making that could be enforced directly in the parser to prevent it from being misused. Maybe nothing to do, not exactly sure.

javanna · 2023-09-06T12:26:13Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/FlatteningXContentParser.java

+ * A subclass of XContentSubParser that provides the functionality to flatten
+ * the field names by prefixing them with the provided parent name.
+ */
+public class FlatteningXContentParser extends XContentSubParser {


Sorry for chiming in late, I'd like to better understand how the additional complexity in ContentPath around tracking the full path is replaced with this custom parser. In a previous experience with DotExpandingXContentParser, we have seen that having a custom parser can have quite a few side effects that are difficult to predict (do we forward all the right methods? is the parser only used how we think it is?), because the parser exposes a pretty generic interface that is used in many different ways and we are customizing it for a very specific scenario.

While additional methods to ContentPath increase its surface, I find that they make the contract easier to reason about and easier to track as well. While I agree that ContentPath is not a fantastic abstraction and needs some love, I am not sure that adding a new parser addresses that fairly, in that it introduces all kinds of unknowns, at least in my opinion.

because the parser exposes a pretty generic interface that is used in many different ways and we are customizing it for a very specific scenario.

That's a fair point - maybe we should make this a private class within DocumentParserContext?

My reason for preferring it this way is that is makes it much simpler to read the DocumentParser code, which is already fairly gnarly. It also opens the way to further simplifications - I played around a bit with rewriting the Dynamic.RUNTIME logic to use this and it mostly just works, although we'd need to revisit how we handle fields that contain spaces.

I agree that DocumentParser could be simplified, greatly. But I think that this type of changes just end up moving the complexity elsewhere and possibly cause collateral issues that are difficult to predict. I am being conservative, I know, that's because I've been bitten with DotExpandingXContentParser before.

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

javanna · 2023-09-07T14:20:15Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

@@ -486,15 +499,16 @@ private static void parseObjectDynamic(DocumentParserContext context, String cur
                    );
                }
                if (dynamicObjectMapper instanceof ObjectMapper) {


Question: here we don't know how the object mapper was created dynamically. In most cases this comes from the usual dynamic mappings (intermediate objects created automatically), but are there cases in which an object may be the result of applying a dynamic template?

The main purpose of this conditional is that we do want to apply dynamic templates that e.g. create ip fields out of objects from docs, and we know implicitly that these can only come from dynamic template because we have no default dynamic mapping for ip fields (or any field that supports parsing objects natively for that matter). I think though that we are also making the assumption that objects are only the result of the default dynamic mappings, while they could be the result of applying a dynamic template. I am not entirely sure what this would cause: such objects could have a different dynamic or enabled property which we are ignoring. I am thinking we may want to consider throwing exception if we get an object mapper as a result of applying a dynamic template. Ideally, we would apply only the dynamic template here so we don't have to guess what happened.

Does this make sense?

I tested this out and you are correct there could be the case in which an object may be result of applying dynamic templates.
I do agree on the solution of throwing an exception in this case.
I'll create a PR with all the suggested improvements.
Thanks for the review!

piergm added 10 commits July 14, 2023 10:35

test plus first implementation

08ae2f4

improved implementation, plus added more tests

1f03367

logic for dotted field names when subobject: false

663dfa9

aligned existing test to new logic

fdefcf2

working dynamic templates

5773464

code lint

99bb79e

Merge branch 'elastic:main' into automatically-flatten-objects-when-s…

b6bd7be

…ubobjects-false

added comprehensive tests for the functionality

8bf797d

added support for array with mixed content

dbc8872

code lint and more tests checks

7cbaa8e

piergm added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.10.0 labels Jul 26, 2023

piergm self-assigned this Jul 26, 2023

Update docs/changelog/97972.yaml

50698c1

piergm requested a review from romseygeek July 26, 2023 13:29

Merge branch 'elastic:main' into automatically-flatten-objects-when-s…

6cd5af2

…ubobjects-false

javanna reviewed Jul 27, 2023

View reviewed changes

piergm added 7 commits July 28, 2023 16:49

re-introduced assertion as per PR comment

21c2be5

added test that indexes multiple docs with different way to specify t…

ca624b1

…he same field

re-introduced removed checks for nestedObjects and ObjectMapper if su…

c4f6e3c

…bobject:false

re-introduced removed flag for WithinLeafObject

c8b5cc5

added test for mapped field that can parse objects nativelly

c65b593

corrected test that ingest multiple docs with different way to specif…

2f47686

…y the same field

Merge branch 'elastic:main' into automatically-flatten-objects-when-s…

f4d725f

…ubobjects-false

javanna requested changes Aug 3, 2023

View reviewed changes

piergm added 2 commits August 22, 2023 10:32

merged main, resolved conflicts

b69502e

added JavaDocs for ContentPath

0edae22

piergm added 12 commits August 24, 2023 09:18

initial implementation after PR comments

e571db6

all tests passing

3c623db

cleaned code

2e483f3

removed comments

0560da8

reverted changes

6a5b9e2

avoid using deprecated methods

060dd7f

added comments

a400ea4

simplified ContentPath

6c7a445

further semplifications

4e62a65

JavaDocs + Tests

3ed03b2

Merge branch 'elastic:main' into automatically-flatten-objects-when-s…

765b6f1

…ubobjects-false

typos + small change in test

cfab9e2

romseygeek reviewed Aug 24, 2023

View reviewed changes

piergm added 4 commits August 24, 2023 14:44

renamed class

df992df

enhanced test

a6c8c6b

small refactor in test

c145e7f

renamed method

86736cf

romseygeek approved these changes Aug 24, 2023

View reviewed changes

piergm merged commit 392c497 into elastic:main Aug 24, 2023

javanna reviewed Sep 6, 2023

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java Show resolved Hide resolved

javanna reviewed Sep 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically flatten objects when subobjects:false #97972

Automatically flatten objects when subobjects:false #97972

piergm commented Jul 26, 2023

elasticsearchmachine commented Jul 26, 2023

elasticsearchmachine commented Jul 26, 2023

felixbarny commented Jul 28, 2023

javanna left a comment

piergm commented Aug 22, 2023

romseygeek commented Aug 23, 2023

romseygeek left a comment

romseygeek left a comment

javanna Sep 6, 2023

piergm Sep 6, 2023

javanna Sep 7, 2023

javanna Sep 6, 2023

romseygeek Sep 6, 2023

piergm Sep 6, 2023

javanna Sep 7, 2023

romseygeek Sep 7, 2023

piergm Sep 7, 2023

javanna Sep 7, 2023

javanna Sep 6, 2023

romseygeek Sep 6, 2023

javanna Sep 7, 2023

javanna Sep 7, 2023

piergm Sep 26, 2023

Automatically flatten objects when subobjects:false #97972

Automatically flatten objects when subobjects:false #97972

Conversation

piergm commented Jul 26, 2023

elasticsearchmachine commented Jul 26, 2023

elasticsearchmachine commented Jul 26, 2023

felixbarny commented Jul 28, 2023

javanna left a comment

Choose a reason for hiding this comment

piergm commented Aug 22, 2023

romseygeek commented Aug 23, 2023

romseygeek left a comment

Choose a reason for hiding this comment

romseygeek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment