-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding aggregations support for the _ignored
field
#101373
Adding aggregations support for the _ignored
field
#101373
Conversation
_ignored
field
Documentation preview: |
Hi @eyalkoren, I've created a changelog YAML for you. |
@Override | ||
public IndexFieldData.Builder fielddataBuilder(FieldDataContext fieldDataContext) { | ||
if (hasDocValues() == false) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when you're aggregating over a data stream that contains both old and new indices? Will this return a partial failure? How does Kibana/Lens handle the failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I will look for a way to add an integration test for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a general question about aggregations on a field in a data stream, I guess we can add a test that is not related to the _ignored
field.
Something like:
- create a component template that has a mapping for field
foo
with"doc_values": false
- create a data stream index template that uses this mapping
- index a document
- update the component template so that the mapping for field
foo
is changed to"doc_values": true
- rollover the data stream
- index another document
- check aggregation on field
foo
Would that cover it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would indeed get partial failures in this case, one failure for each shard belonging to the old indices, as they can't be aggregated on.
server/src/main/java/org/elasticsearch/index/mapper/IgnoredFieldMapper.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredFieldMapper.java
Outdated
Show resolved
Hide resolved
@javanna please take a look to see if this is the right direction.
|
I am leaning towards removing the stored field, but I think this requires further discussion, because it is going to be very difficult to go back to relying on the order if we stop storing the field. We said we won't focus for now on the ignored reason, and I would add that we should try not to have positional logic around it when we do so. Could you make the changes necessary to retrieve the field from doc values and stop storing the field in the meantime?
I am not sure that we need to test for that. We know we are going to have partial failures in that case. |
Thanks for the feedback @javanna 🙏
I will give it a try 🙂
OK, then I guess the next question is - what do we do about that? Only document it as a caveat related to aggregating on existing data streams? Something else? |
IMO, it would be enough to document that in the same place where you document that the field is supports aggregations now. |
@javanna the current state is non-functional, but I pushed so you can see what I have done so far. What I know is that the |
@@ -194,7 +194,7 @@ The API returns the following result: | |||
"load_source_count": 5 | |||
}, | |||
"debug": { | |||
"stored_fields": ["_id", "_ignored", "_routing", "_source"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not rerturned anymore because the field is not stored anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, I was expecting this change.
@@ -123,12 +123,12 @@ public void testWithIgnored() { | |||
{ | |||
GetResponse getResponse = client().prepareGet("test", "1").get(); | |||
assertTrue(getResponse.isExists()); | |||
assertThat(getResponse.getField("_ignored"), nullValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not required...will restore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments, but I think this is very close.
@@ -194,7 +194,7 @@ The API returns the following result: | |||
"load_source_count": 5 | |||
}, | |||
"debug": { | |||
"stored_fields": ["_id", "_ignored", "_routing", "_source"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, I was expecting this change.
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/get/120_stored_fields_ignored.yml
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/get/ShardGetService.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/action/termvectors/GetTermVectorsIT.java
Outdated
Show resolved
Hide resolved
Ideally we should stop using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple questions, mostly around testing. LGTM otherwise! Great work!
@@ -140,7 +140,7 @@ profile fetch: | |||
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 } | |||
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 } | |||
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 } | |||
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] } | |||
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if with this change, the skip above needs updating? Isn't it surprising that this test runs? 8.14 returns the _ignored
field while 8.15 does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this but the other was ok.
import java.util.Locale; | ||
import java.util.Map; | ||
|
||
public class IgnoredMetaFieldRollingUpgradeIT extends ParameterizedRollingUpgradeTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great test to have!
@@ -139,7 +139,7 @@ fetch nested source: | |||
- gt: { profile.shards.0.fetch.breakdown.next_reader: 0 } | |||
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 } | |||
- gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 } | |||
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] } | |||
- match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, I wonder if we need to update the skip above, that's what I would expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I forgot about this commit that already made that change: 19db490
} | ||
} | ||
|
||
public void testIgnoredMetadataFieldFetch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a duplicate of MetadatFetchingIT#testWithIgnored ? Is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this comment...I can remove the test in another PR.
@@ -104,6 +104,7 @@ private static IndexVersion def(int id, Version luceneVersion) { | |||
public static final IndexVersion UPGRADE_TO_LUCENE_9_10 = def(8_503_00_0, Version.LUCENE_9_10_0); | |||
public static final IndexVersion TIME_SERIES_ROUTING_HASH_IN_ID = def(8_504_00_0, Version.LUCENE_9_10_0); | |||
public static final IndexVersion DEFAULT_DENSE_VECTOR_TO_INT8_HNSW = def(8_505_00_0, Version.LUCENE_9_10_0); | |||
public static final IndexVersion DOC_VALUES_FOR_IGNORED_META_FIELD = def(8_505_00_1, Version.LUCENE_9_10_0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eyalkoren I've just spotted this - this should have incremented the NNN version of the version id, not the patch version. Check the comment right below the version constants for more info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salvatore-campagna is the one two brought this to completion, thanks @thecoop for raising it.
Closes #59946
The actual implementation needs to take into account how we decide to implement #101153.
One of the original requirements was to stop making the
_ignored
field stored while addingdoc_values
to it. However, the current proposal of #101153 assumes that we can access the original order of the field's content, which means we must keep it stored as well. The eventual approach will determine the implementation of this change.