Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid attempting to load the same empty field twice in fetch phase #107551

Merged
merged 11 commits into from
Apr 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/107551.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 107551
summary: Avoid attempting to load the same empty field twice in fetch phase
area: Search
type: bug
issues: []
4 changes: 2 additions & 2 deletions docs/reference/search/profile.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ The API returns the following result:
"load_source_count": 5
},
"debug": {
"stored_fields": ["_id", "_routing", "_source"]
"stored_fields": ["_id", "_ignored", "_routing", "_source"]
},
"children": [
{
Expand Down Expand Up @@ -1051,7 +1051,7 @@ And here is the fetch profile:
"load_source_count": 5
},
"debug": {
"stored_fields": ["_id", "_routing", "_source"]
"stored_fields": ["_id", "_ignored", "_routing", "_source"]
Copy link
Member Author

@javanna javanna Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a consequence of exposing the correct stored fields spec in FetchFieldsPhase, that takes stored metadata fields into account. _ignored will be removed again once it's no longer stored. The problem is that the field should have been there since its introduction, but it was never added to StoredFieldLoader#fieldsToLoad which is where the other three fields come from. Note that StoredFieldsPhase did not include in its stored fields spec the default metadata fields that it always requested.

For posterity, why is _type not there if fetch fields phase requests it by default? Because it is not mapped in recent clusters, and it is only part of the stored fields spec, hence loaded, when it is mapped, which is the case only in very old archive indices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

},
"children": [
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ profile fetch:
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
- length: { profile.shards.0.fetch.children: 4 }
- match: { profile.shards.0.fetch.children.0.type: FetchFieldsPhase }
- gt: { profile.shards.0.fetch.children.0.breakdown.next_reader_count: 0 }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ fetch fields:
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
- length: { profile.shards.0.fetch.children: 2 }
- match: { profile.shards.0.fetch.children.0.type: FetchFieldsPhase }
- gt: { profile.shards.0.fetch.children.0.breakdown.next_reader_count: 0 }
Expand Down Expand Up @@ -74,7 +74,7 @@ fetch source:
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
- length: { profile.shards.0.fetch.children: 3 }
- match: { profile.shards.0.fetch.children.0.type: FetchFieldsPhase }
- match: { profile.shards.0.fetch.children.1.type: FetchSourcePhase }
Expand Down Expand Up @@ -139,7 +139,7 @@ fetch nested source:
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
- length: { profile.shards.0.fetch.children: 4 }
- match: { profile.shards.0.fetch.children.0.type: FetchFieldsPhase }
- match: { profile.shards.0.fetch.children.1.type: FetchSourcePhase }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,4 +168,18 @@ public void testInvalid() {
assertThat(exc.getMessage(), equalTo("cannot combine _none_ with other fields"));
}
}

public void testFetchId() {
assertAcked(prepareCreate("test"));
ensureGreen();

prepareIndex("test").setId("1").setSource("field", "value").get();
refresh();

assertResponse(prepareSearch("test").addFetchField("_id"), response -> {
assertEquals(1, response.getHits().getHits().length);
assertEquals("1", response.getHits().getAt(0).getId());
assertEquals("1", response.getHits().getAt(0).field("_id").getValue());
});
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,13 @@ private SearchHits buildSearchHits(SearchContext context, int[] docIdsToLoad, Pr
context.getSearchExecutionContext().setLookupProviders(sourceProvider, ctx -> fieldLookupProvider);

List<FetchSubPhaseProcessor> processors = getProcessors(context.shardTarget(), fetchContext, profiler);

StoredFieldsSpec storedFieldsSpec = StoredFieldsSpec.build(processors, FetchSubPhaseProcessor::storedFieldsSpec);
storedFieldsSpec = storedFieldsSpec.merge(new StoredFieldsSpec(false, false, sourceLoader.requiredStoredFields()));
// Ideally the required stored fields would be provided as constructor argument a few lines above, but that requires moving
// the getProcessors call to before the setLookupProviders call, which causes weird issues in InnerHitsPhase.
// setLookupProviders resets the SearchLookup used throughout the rest of the fetch phase, which StoredValueFetchers rely on
// to retrieve stored fields, and InnerHitsPhase is the last sub-fetch phase and re-runs the entire fetch phase.
fieldLookupProvider.setPreloadedStoredFieldNames(storedFieldsSpec.requiredStoredFields());

StoredFieldLoader storedFieldLoader = profiler.storedFields(StoredFieldLoader.fromSpec(storedFieldsSpec));
IdLoader idLoader = context.newIdLoader();
Expand Down Expand Up @@ -164,7 +168,7 @@ protected SearchHit nextDoc(int doc) throws IOException {
leafIdLoader
);
sourceProvider.source = hit.source();
fieldLookupProvider.storedFields = hit.loadedFields();
fieldLookupProvider.setPreloadedStoredFieldValues(hit.hit().getId(), hit.loadedFields());
for (FetchSubPhaseProcessor processor : processors) {
processor.process(hit);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,16 @@
package org.elasticsearch.search.fetch;

import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.util.SetOnce;
import org.elasticsearch.index.mapper.IdFieldMapper;
import org.elasticsearch.search.lookup.FieldLookup;
import org.elasticsearch.search.lookup.LeafFieldLookupProvider;

import java.io.IOException;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.function.Supplier;

/**
Expand All @@ -26,15 +30,22 @@
*/
class PreloadedFieldLookupProvider implements LeafFieldLookupProvider {

Map<String, List<Object>> storedFields;
LeafFieldLookupProvider backUpLoader;
Supplier<LeafFieldLookupProvider> loaderSupplier;
private final SetOnce<Set<String>> preloadedStoredFieldNames = new SetOnce<>();
private Map<String, List<Object>> preloadedStoredFieldValues;
private String id;
private LeafFieldLookupProvider backUpLoader;
private Supplier<LeafFieldLookupProvider> loaderSupplier;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the chance to make these private and add package private setter/getter methods when needed. I find that it clarifies who does what and when.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

@salvatore-campagna salvatore-campagna Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question...do I understand it correctly that preloadedStoredFieldValues.keySet() is the same as preloadedStoredFieldNames? Or is one a subset of the other?

Because later on I see we do

if (preloadedStoredFieldNames.get().contains(field)) {
            fieldLookup.setValues(preloadedStoredFieldValues.get(field));

which looks like as "if the name is there and the field was preloaded then just get the preloaded values"...

I see that one comes from hit.loadedFields() and the other from StoredFieldsSpec but I was wondering if they always include the same fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they were the same we would not need a separate set. preloadedStoredFieldNames includes all the fields that we know we will attempt to load for all documents. Those include fields that don't have a value, while preloadedStoredFieldValues contains only those fields that were found in the current doc.

The overhead was caused by trying to load _ignored and _routing ad-hoc when requested for all docs that did not have a value for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thank you.


@Override
public void populateFieldLookup(FieldLookup fieldLookup, int doc) throws IOException {
String field = fieldLookup.fieldType().name();
if (storedFields.containsKey(field)) {
fieldLookup.setValues(storedFields.get(field));

if (field.equals(IdFieldMapper.NAME)) {
Copy link
Member Author

@javanna javanna Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_id was previously always loaded via the backup loader when requested via script or via fetch fields. That causes overhead as the field is already available at all times, no need to go and fetch it from stored fields!

This is because it is not part of the ordinary loaded stored fields, hence it needs to be provided and handled separately.

fieldLookup.setValues(Collections.singletonList(id));
return;
}
if (preloadedStoredFieldNames.get().contains(field)) {
fieldLookup.setValues(preloadedStoredFieldValues.get(field));
return;
}
// stored field not preloaded, go and get it directly
Expand All @@ -44,8 +55,26 @@ public void populateFieldLookup(FieldLookup fieldLookup, int doc) throws IOExcep
backUpLoader.populateFieldLookup(fieldLookup, doc);
}

void setPreloadedStoredFieldNames(Set<String> preloadedStoredFieldNames) {
this.preloadedStoredFieldNames.set(preloadedStoredFieldNames);
}

void setPreloadedStoredFieldValues(String id, Map<String, List<Object>> preloadedStoredFieldValues) {
assert preloadedStoredFieldNames.get().containsAll(preloadedStoredFieldValues.keySet())
: "Provided stored field that was not expected to be preloaded? "
+ preloadedStoredFieldValues.keySet()
+ " - "
+ preloadedStoredFieldNames;
this.preloadedStoredFieldValues = preloadedStoredFieldValues;
this.id = id;
}

void setNextReader(LeafReaderContext ctx) {
backUpLoader = null;
loaderSupplier = () -> LeafFieldLookupProvider.fromStoredFields().apply(ctx);
}

LeafFieldLookupProvider getBackUpLoader() {
return backUpLoader;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ public final class FetchFieldsPhase implements FetchSubPhase {
private static final List<FieldAndFormat> DEFAULT_METADATA_FIELDS = List.of(
new FieldAndFormat(IgnoredFieldMapper.NAME, null),
new FieldAndFormat(RoutingFieldMapper.NAME, null),
// will only be fetched when mapped (older archived indices)
new FieldAndFormat(LegacyTypeFieldMapper.NAME, null)
);

Expand Down Expand Up @@ -95,9 +96,9 @@ public void setNextReader(LeafReaderContext readerContext) {
@Override
public StoredFieldsSpec storedFieldsSpec() {
if (fieldFetcher != null) {
return fieldFetcher.storedFieldsSpec();
return metadataFieldFetcher.storedFieldsSpec().merge(fieldFetcher.storedFieldsSpec());
}
return StoredFieldsSpec.NO_REQUIREMENTS;
return metadataFieldFetcher.storedFieldsSpec();
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@ public MappedFieldType fieldType() {
*/
public void setValues(List<Object> values) {
assert valuesLoaded == false : "Call clear() before calling setValues()";
values.stream().map(fieldType::valueForDisplay).forEach(this.values::add);
if (values != null) {
values.stream().map(fieldType::valueForDisplay).forEach(this.values::add);
}
this.valuesLoaded = true;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,13 @@
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.memory.MemoryIndex;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.index.mapper.IdFieldMapper;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.search.lookup.FieldLookup;
import org.elasticsearch.test.ESTestCase;

import java.io.IOException;
import java.util.Collections;
import java.util.List;
import java.util.Locale;
import java.util.Map;
Expand All @@ -30,7 +32,16 @@ public class PreloadedFieldLookupProviderTests extends ESTestCase {

public void testFallback() throws IOException {
PreloadedFieldLookupProvider lookup = new PreloadedFieldLookupProvider();
lookup.storedFields = Map.of("foo", List.of("bar"));
lookup.setPreloadedStoredFieldNames(Collections.singleton("foo"));
lookup.setPreloadedStoredFieldValues("id", Map.of("foo", List.of("bar")));

MappedFieldType idFieldType = mock(MappedFieldType.class);
when(idFieldType.name()).thenReturn(IdFieldMapper.NAME);
when(idFieldType.valueForDisplay(any())).then(invocation -> (invocation.getArguments()[0]));
FieldLookup idFieldLookup = new FieldLookup(idFieldType);
lookup.populateFieldLookup(idFieldLookup, 0);
assertEquals("id", idFieldLookup.getValue());
assertNull(lookup.getBackUpLoader()); // fallback didn't get used because 'foo' is in the list

MappedFieldType fieldType = mock(MappedFieldType.class);
when(fieldType.name()).thenReturn("foo");
Expand All @@ -39,7 +50,7 @@ public void testFallback() throws IOException {

lookup.populateFieldLookup(fieldLookup, 0);
assertEquals("BAR", fieldLookup.getValue());
assertNull(lookup.backUpLoader); // fallback didn't get used because 'foo' is in the list
assertNull(lookup.getBackUpLoader()); // fallback didn't get used because 'foo' is in the list

MappedFieldType unloadedFieldType = mock(MappedFieldType.class);
when(unloadedFieldType.name()).thenReturn("unloaded");
Expand All @@ -56,5 +67,4 @@ public void testFallback() throws IOException {
lookup.populateFieldLookup(unloadedFieldLookup, 0);
assertEquals("VALUE", unloadedFieldLookup.getValue());
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,17 @@
import org.elasticsearch.index.mapper.DocValueFetcher;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.NestedLookup;
import org.elasticsearch.index.mapper.StoredValueFetcher;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.support.CoreValuesSourceType;
import org.elasticsearch.search.fetch.FetchContext;
import org.elasticsearch.search.fetch.FetchSubPhase;
import org.elasticsearch.search.fetch.FetchSubPhaseProcessor;
import org.elasticsearch.search.fetch.StoredFieldsContext;
import org.elasticsearch.search.fetch.StoredFieldsSpec;
import org.elasticsearch.search.lookup.SearchLookup;
import org.elasticsearch.search.lookup.Source;
import org.elasticsearch.test.ESTestCase;

Expand All @@ -35,6 +39,7 @@
import java.util.Set;

import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;

Expand Down Expand Up @@ -90,6 +95,58 @@ public void testDocValueFetcher() throws IOException {

reader.close();
dir.close();
}

public void testStoredFieldsSpec() {
StoredFieldsContext storedFieldsContext = StoredFieldsContext.fromList(List.of("stored", "_metadata"));
FetchFieldsContext ffc = new FetchFieldsContext(List.of(new FieldAndFormat("field", null)));

SearchLookup searchLookup = mock(SearchLookup.class);

SearchExecutionContext sec = mock(SearchExecutionContext.class);
when(sec.isMetadataField(any())).then(invocation -> invocation.getArguments()[0].toString().startsWith("_"));

MappedFieldType routingFt = mock(MappedFieldType.class);
when(routingFt.valueFetcher(any(), any())).thenReturn(new StoredValueFetcher(searchLookup, "_routing"));
when(sec.getFieldType(eq("_routing"))).thenReturn(routingFt);

// this would normally not be mapped -> getMatchingFieldsNames would not resolve it (unless for older archive indices)
MappedFieldType typeFt = mock(MappedFieldType.class);
when(typeFt.valueFetcher(any(), any())).thenReturn(new StoredValueFetcher(searchLookup, "_type"));
when(sec.getFieldType(eq("_type"))).thenReturn(typeFt);

MappedFieldType ignoredFt = mock(MappedFieldType.class);
when(ignoredFt.valueFetcher(any(), any())).thenReturn(new StoredValueFetcher(searchLookup, "_ignored"));
when(sec.getFieldType(eq("_ignored"))).thenReturn(ignoredFt);

// Ideally we would test that explicitly requested stored fields are included in stored fields spec, but isStored is final hence it
// can't be mocked. In reality, _metadata would be included but stored would not.
MappedFieldType storedFt = mock(MappedFieldType.class);
when(sec.getFieldType(eq("stored"))).thenReturn(storedFt);
MappedFieldType metadataFt = mock(MappedFieldType.class);
when(sec.getFieldType(eq("_metadata"))).thenReturn(metadataFt);

MappedFieldType fieldType = mock(MappedFieldType.class);
when(fieldType.valueFetcher(any(), any())).thenReturn(
new DocValueFetcher(
DocValueFormat.RAW,
new SortedNumericIndexFieldData("field", IndexNumericFieldData.NumericType.LONG, CoreValuesSourceType.NUMERIC, null, false)
)
);
when(sec.getFieldType(eq("field"))).thenReturn(fieldType);

when(sec.getMatchingFieldNames(any())).then(invocation -> Set.of(invocation.getArguments()[0]));
when(sec.nestedLookup()).thenReturn(NestedLookup.EMPTY);
FetchContext fetchContext = mock(FetchContext.class);
when(fetchContext.fetchFieldsContext()).thenReturn(ffc);
when(fetchContext.storedFieldsContext()).thenReturn(storedFieldsContext);
when(fetchContext.getSearchExecutionContext()).thenReturn(sec);
FetchFieldsPhase fetchFieldsPhase = new FetchFieldsPhase();
FetchSubPhaseProcessor processor = fetchFieldsPhase.getProcessor(fetchContext);
StoredFieldsSpec storedFieldsSpec = processor.storedFieldsSpec();
assertEquals(3, storedFieldsSpec.requiredStoredFields().size());
assertTrue(storedFieldsSpec.requiredStoredFields().contains("_routing"));
assertTrue(storedFieldsSpec.requiredStoredFields().contains("_ignored"));
assertTrue(storedFieldsSpec.requiredStoredFields().contains("_type"));
}
}