Avoid attempting to load the same empty field twice in fetch phase #107551

javanna · 2024-04-16T19:21:28Z

During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in StoredFieldsSpec that each fetch sub phase exposes.

We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via SearchLookup). This is done in PreloadedFieldLookupProvider. The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead.

This change makes available to PreloadedFieldLookupProvider the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time).

This is an existing issue, that has become evident as we moved fetching of metadata fields to FetchFieldsPhase, that relies on value fetchers, and hence on SearchLookup. We end up attempting to load default metadata fields (_ignored and _routing) twice when they are not present in a document, which makes us call LeafReader#storedFields additional times for the same document providing a SingleFieldVisitor that will never find a value.

Another existing issue that this PR fixes is for the FetchFieldsPhase to extend the StoredFieldsSpec that it exposes to include the metadata fields that the phase is now responsible for loading. That results in _ignored being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in StoredFieldLoader#fieldsToLoad).

Yet another existing issues that this PR fixes is that _id has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the PreloadedFieldLookupProvider. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.

During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes. We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via SearchLookup). This is done in PreloadedFieldLookupProvider. The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, which causes overhead. This change makes available to PreloadedFieldLookupProvider the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time). This is an existing issue, that has become evident as we moved fetching of metadata fields to FetchFieldsPhase, that relies on value fetchers, and hence on SearchLookup. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value.

javanna · 2024-04-16T22:25:12Z

docs/reference/search/profile.asciidoc

@@ -1051,7 +1051,7 @@ And here is the fetch profile:
            "load_source_count": 5
          },
          "debug": {
-            "stored_fields": ["_id", "_routing", "_source"]
+            "stored_fields": ["_id", "_ignored", "_routing", "_source"]


This is a consequence of exposing the correct stored fields spec in FetchFieldsPhase, that takes stored metadata fields into account. _ignored will be removed again once it's no longer stored. The problem is that the field should have been there since its introduction, but it was never added to StoredFieldLoader#fieldsToLoad which is where the other three fields come from. Note that StoredFieldsPhase did not include in its stored fields spec the default metadata fields that it always requested.

For posterity, why is _type not there if fetch fields phase requests it by default? Because it is not mapped in recent clusters, and it is only part of the stored fields spec, hence loaded, when it is mapped, which is the case only in very old archive indices.

javanna · 2024-04-16T22:26:21Z

server/src/main/java/org/elasticsearch/search/fetch/PreloadedFieldLookupProvider.java

+    private Set<String> preloadedStoredFields;
+    private Map<String, List<Object>> storedFields;
+    private LeafFieldLookupProvider backUpLoader;
+    private Supplier<LeafFieldLookupProvider> loaderSupplier;


I took the chance to make these private and add package private setter/getter methods when needed. I find that it clarifies who does what and when.

Just a question...do I understand it correctly that preloadedStoredFieldValues.keySet() is the same as preloadedStoredFieldNames? Or is one a subset of the other?

Because later on I see we do

if (preloadedStoredFieldNames.get().contains(field)) { fieldLookup.setValues(preloadedStoredFieldValues.get(field));

which looks like as "if the name is there and the field was preloaded then just get the preloaded values"...

I see that one comes from hit.loadedFields() and the other from StoredFieldsSpec but I was wondering if they always include the same fields.

if they were the same we would not need a separate set. preloadedStoredFieldNames includes all the fields that we know we will attempt to load for all documents. Those include fields that don't have a value, while preloadedStoredFieldValues contains only those fields that were found in the current doc.

The overhead was caused by trying to load _ignored and _routing ad-hoc when requested for all docs that did not have a value for them.

Ok thank you.

javanna · 2024-04-17T08:01:50Z

server/src/main/java/org/elasticsearch/search/fetch/PreloadedFieldLookupProvider.java


    @Override
    public void populateFieldLookup(FieldLookup fieldLookup, int doc) throws IOException {
        String field = fieldLookup.fieldType().name();
-        if (storedFields.containsKey(field)) {
+
+        if (field.equals(IdFieldMapper.NAME)) {


_id was previously always loaded via the backup loader when requested via script or via fetch fields. That causes overhead as the field is already available at all times, no need to go and fetch it from stored fields!

This is because it is not part of the ordinary loaded stored fields, hence it needs to be provided and handled separately.

elasticsearchmachine · 2024-04-17T08:09:09Z

Hi @javanna, I've created a changelog YAML for you.

elasticsearchmachine · 2024-04-17T08:09:09Z

Pinging @elastic/es-search (Team:Search)

nik9000 · 2024-04-17T14:09:09Z

docs/reference/search/profile.asciidoc

@@ -1051,7 +1051,7 @@ And here is the fetch profile:
            "load_source_count": 5
          },
          "debug": {
-            "stored_fields": ["_id", "_routing", "_source"]
+            "stored_fields": ["_id", "_ignored", "_routing", "_source"]


nik9000 · 2024-04-17T14:10:14Z

server/src/main/java/org/elasticsearch/search/fetch/PreloadedFieldLookupProvider.java

+    private Set<String> preloadedStoredFields;
+    private Map<String, List<Object>> storedFields;
+    private LeafFieldLookupProvider backUpLoader;
+    private Supplier<LeafFieldLookupProvider> loaderSupplier;


nik9000 · 2024-04-17T14:12:15Z

server/src/main/java/org/elasticsearch/search/fetch/PreloadedFieldLookupProvider.java

+        this.preloadedStoredFields = preloadedStoredFields;
+    }
+
+    void setStoredFields(String id, Map<String, List<Object>> storedFields) {


Maybe setPreloadedStoredFieldValues and the one above is setPreloadedStoredFieldsNames. That way it's clear these are mirror of eachother.

good idea, done.

salvatore-campagna · 2024-04-17T14:41:20Z

Thanks Luca, I just left a question to better understand how this works. Everything else LGTM.

…lastic#107551) During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes. We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead. This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time). This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value. Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`). Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.

elasticsearchmachine · 2024-04-17T17:38:25Z

💚 Backport successful

Status	Branch	Result
✅	8.14

…107551) During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes. We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead. This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time). This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value. Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`). Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.

elasticsearchmachine added the v8.14.0 label Apr 16, 2024

javanna added 4 commits April 16, 2024 22:11

iter

4d479fc

iter

7031f16

iter

8292988

iter

a4f6c07

javanna commented Apr 16, 2024

View reviewed changes

javanna added 2 commits April 17, 2024 09:24

iter

0a39458

iter

888d88f

javanna commented Apr 17, 2024

View reviewed changes

javanna added :Search/Search Search-related issues that do not fall into other categories >bug labels Apr 17, 2024

javanna marked this pull request as ready for review April 17, 2024 08:08

elasticsearchmachine added the Team:Search Meta label for search team label Apr 17, 2024

javanna and others added 2 commits April 17, 2024 10:09

Update docs/changelog/107551.yaml

65be419

iter

411fbd5

nik9000 approved these changes Apr 17, 2024

View reviewed changes

iter

5f4cc87

salvatore-campagna approved these changes Apr 17, 2024

View reviewed changes

Merge branch 'main' into fix/fields_lookup_stored_spec

20131ac

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

javanna added v8.14.0 auto-backport Automatically create backport pull requests when merged labels Apr 17, 2024

javanna merged commit 223e7f8 into elastic:main Apr 17, 2024
14 checks passed

javanna deleted the fix/fields_lookup_stored_spec branch April 17, 2024 17:37

javanna mentioned this pull request Apr 17, 2024

[8.14] Avoid attempting to load the same empty field twice in fetch phase (#107551) #107580

Merged

javanna added a commit that referenced this pull request Apr 17, 2024

Update skip for profile yaml tests following #107551

19db490

javanna added v8.14.1 and removed v8.14.0 labels Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid attempting to load the same empty field twice in fetch phase #107551

Avoid attempting to load the same empty field twice in fetch phase #107551

javanna commented Apr 16, 2024 •

edited

Loading

javanna Apr 16, 2024 •

edited

Loading

nik9000 Apr 17, 2024

javanna Apr 16, 2024

nik9000 Apr 17, 2024

salvatore-campagna Apr 17, 2024 •

edited

Loading

javanna Apr 17, 2024

salvatore-campagna Apr 17, 2024

javanna Apr 17, 2024 •

edited

Loading

elasticsearchmachine commented Apr 17, 2024

elasticsearchmachine commented Apr 17, 2024

nik9000 Apr 17, 2024

nik9000 Apr 17, 2024

nik9000 Apr 17, 2024

javanna Apr 17, 2024

salvatore-campagna commented Apr 17, 2024

elasticsearchmachine commented Apr 17, 2024

Avoid attempting to load the same empty field twice in fetch phase #107551

Avoid attempting to load the same empty field twice in fetch phase #107551

Conversation

javanna commented Apr 16, 2024 • edited Loading

javanna Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvatore-campagna Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

elasticsearchmachine commented Apr 17, 2024

elasticsearchmachine commented Apr 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvatore-campagna commented Apr 17, 2024

elasticsearchmachine commented Apr 17, 2024

💚 Backport successful

javanna commented Apr 16, 2024 •

edited

Loading

javanna Apr 16, 2024 •

edited

Loading

salvatore-campagna Apr 17, 2024 •

edited

Loading

javanna Apr 17, 2024 •

edited

Loading