-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid attempting to load the same empty field twice in fetch phase #107551
Conversation
During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes. We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via SearchLookup). This is done in PreloadedFieldLookupProvider. The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, which causes overhead. This change makes available to PreloadedFieldLookupProvider the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time). This is an existing issue, that has become evident as we moved fetching of metadata fields to FetchFieldsPhase, that relies on value fetchers, and hence on SearchLookup. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value.
@@ -1051,7 +1051,7 @@ And here is the fetch profile: | |||
"load_source_count": 5 | |||
}, | |||
"debug": { | |||
"stored_fields": ["_id", "_routing", "_source"] | |||
"stored_fields": ["_id", "_ignored", "_routing", "_source"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a consequence of exposing the correct stored fields spec in FetchFieldsPhase, that takes stored metadata fields into account. _ignored
will be removed again once it's no longer stored. The problem is that the field should have been there since its introduction, but it was never added to StoredFieldLoader#fieldsToLoad
which is where the other three fields come from. Note that StoredFieldsPhase
did not include in its stored fields spec the default metadata fields that it always requested.
For posterity, why is _type
not there if fetch fields phase requests it by default? Because it is not mapped in recent clusters, and it is only part of the stored fields spec, hence loaded, when it is mapped, which is the case only in very old archive indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
private Set<String> preloadedStoredFields; | ||
private Map<String, List<Object>> storedFields; | ||
private LeafFieldLookupProvider backUpLoader; | ||
private Supplier<LeafFieldLookupProvider> loaderSupplier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the chance to make these private and add package private setter/getter methods when needed. I find that it clarifies who does what and when.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question...do I understand it correctly that preloadedStoredFieldValues.keySet()
is the same as preloadedStoredFieldNames
? Or is one a subset of the other?
Because later on I see we do
if (preloadedStoredFieldNames.get().contains(field)) {
fieldLookup.setValues(preloadedStoredFieldValues.get(field));
which looks like as "if the name is there and the field was preloaded then just get the preloaded values"...
I see that one comes from hit.loadedFields()
and the other from StoredFieldsSpec
but I was wondering if they always include the same fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if they were the same we would not need a separate set. preloadedStoredFieldNames includes all the fields that we know we will attempt to load for all documents. Those include fields that don't have a value, while preloadedStoredFieldValues contains only those fields that were found in the current doc.
The overhead was caused by trying to load _ignored
and _routing
ad-hoc when requested for all docs that did not have a value for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thank you.
|
||
@Override | ||
public void populateFieldLookup(FieldLookup fieldLookup, int doc) throws IOException { | ||
String field = fieldLookup.fieldType().name(); | ||
if (storedFields.containsKey(field)) { | ||
|
||
if (field.equals(IdFieldMapper.NAME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_id
was previously always loaded via the backup loader when requested via script or via fetch fields. That causes overhead as the field is already available at all times, no need to go and fetch it from stored fields!
This is because it is not part of the ordinary loaded stored fields, hence it needs to be provided and handled separately.
Hi @javanna, I've created a changelog YAML for you. |
Pinging @elastic/es-search (Team:Search) |
@@ -1051,7 +1051,7 @@ And here is the fetch profile: | |||
"load_source_count": 5 | |||
}, | |||
"debug": { | |||
"stored_fields": ["_id", "_routing", "_source"] | |||
"stored_fields": ["_id", "_ignored", "_routing", "_source"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
private Set<String> preloadedStoredFields; | ||
private Map<String, List<Object>> storedFields; | ||
private LeafFieldLookupProvider backUpLoader; | ||
private Supplier<LeafFieldLookupProvider> loaderSupplier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
this.preloadedStoredFields = preloadedStoredFields; | ||
} | ||
|
||
void setStoredFields(String id, Map<String, List<Object>> storedFields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe setPreloadedStoredFieldValues
and the one above is setPreloadedStoredFieldsNames
. That way it's clear these are mirror of eachother.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, done.
Thanks Luca, I just left a question to better understand how this works. Everything else LGTM. |
…lastic#107551) During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes. We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead. This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time). This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value. Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`). Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.
💚 Backport successful
|
…107551) During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in `StoredFieldsSpec` that each fetch sub phase exposes. We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via `SearchLookup`). This is done in `PreloadedFieldLookupProvider.` The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead. This change makes available to `PreloadedFieldLookupProvider` the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time). This is an existing issue, that has become evident as we moved fetching of metadata fields to `FetchFieldsPhase`, that relies on value fetchers, and hence on `SearchLookup`. We end up attempting to load default metadata fields (`_ignored` and `_routing`) twice when they are not present in a document, which makes us call `LeafReader#storedFields` additional times for the same document providing a `SingleFieldVisitor` that will never find a value. Another existing issue that this PR fixes is for the `FetchFieldsPhase` to extend the `StoredFieldsSpec` that it exposes to include the metadata fields that the phase is now responsible for loading. That results in `_ignored` being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing in `StoredFieldLoader#fieldsToLoad`). Yet another existing issues that this PR fixes is that `_id` has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to the `PreloadedFieldLookupProvider`. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.
During the fetch phase, there's a number of stored fields that are requested explicitly or loaded by default. That information is included in
StoredFieldsSpec
that each fetch sub phase exposes.We attempt to provide stored fields that are already loaded to the fields lookup that scripts as well as value fetchers use to load field values (via
SearchLookup
). This is done inPreloadedFieldLookupProvider.
The current logic makes available values for fields that have been found, so that scripts or value fetchers that request them don't load them again ad-hoc. What happens though for stored fields that don't have a value for a specific doc, is that they are treated like any other field that was not requested, and loaded again, although they will not be found, which causes overhead.This change makes available to
PreloadedFieldLookupProvider
the list of required stored fields, so that it can better distinguish between fields that we already attempted to load (although we may not have found a value for them) and those that need to be loaded ad-hoc (for instance because a script is requesting them for the first time).This is an existing issue, that has become evident as we moved fetching of metadata fields to
FetchFieldsPhase
, that relies on value fetchers, and hence onSearchLookup
. We end up attempting to load default metadata fields (_ignored
and_routing
) twice when they are not present in a document, which makes us callLeafReader#storedFields
additional times for the same document providing aSingleFieldVisitor
that will never find a value.Another existing issue that this PR fixes is for the
FetchFieldsPhase
to extend theStoredFieldsSpec
that it exposes to include the metadata fields that the phase is now responsible for loading. That results in_ignored
being included in the output of the debug stored fields section when profiling is enabled. The fact that it was previously missing is an existing bug (it was missing inStoredFieldLoader#fieldsToLoad
).Yet another existing issues that this PR fixes is that
_id
has been until now always loaded on demand when requested via fetch fields or script. That is because it is not part of the preloaded stored fields that the fetch phase passes over to thePreloadedFieldLookupProvider
. That causes overhead as the field has already been loaded, and should not be loaded once again when explicitly requested.