Adding aggregations support for the `_ignored` field #101373

eyalkoren · 2023-10-26T11:15:08Z

The actual implementation needs to take into account how we decide to implement #101153.
One of the original requirements was to stop making the _ignored field stored while adding doc_values to it. However, the current proposal of #101153 assumes that we can access the original order of the field's content, which means we must keep it stored as well. The eventual approach will determine the implementation of this change.

github-actions · 2023-10-26T11:15:23Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2023-10-26T11:15:43Z

Hi @eyalkoren, I've created a changelog YAML for you.

felixbarny · 2023-10-26T11:47:29Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredFieldMapper.java

+        @Override
+        public IndexFieldData.Builder fielddataBuilder(FieldDataContext fieldDataContext) {
+            if (hasDocValues() == false) {
+                throw new IllegalArgumentException(


What happens when you're aggregating over a data stream that contains both old and new indices? Will this return a partial failure? How does Kibana/Lens handle the failure?

I don't know. I will look for a way to add an integration test for that

Since it's a general question about aggregations on a field in a data stream, I guess we can add a test that is not related to the _ignored field.
Something like:

create a component template that has a mapping for field foo with "doc_values": false

create a data stream index template that uses this mapping

index a document

update the component template so that the mapping for field foo is changed to "doc_values": true

rollover the data stream

index another document

check aggregation on field foo

Would that cover it?

you would indeed get partial failures in this case, one failure for each shard belonging to the old indices, as they can't be aggregated on.

server/src/main/java/org/elasticsearch/index/mapper/IgnoredFieldMapper.java

…e_field_aggs

eyalkoren · 2023-11-20T15:36:43Z

@javanna please take a look to see if this is the right direction.
Some specific input I'd be happy to get:

Only for the sake of Aggregating _ignored field values #59946, we don't need to keep the _ignored field stored. However, for Indicate why a field has been _ignored #101153 we may need it. What do you think we should do as part of this PR?
Can you answer @felixbarny's question? Can you provide an advice of how to create a test for that? EDIT: I proposed a test scenario, let me know whether you agree that it covers what we are looking for

javanna · 2023-11-24T14:07:36Z

Only for the sake of #59946, we don't need to keep the _ignored field stored. However, for #101153 we may need it. What do you think we should do as part of this PR?

I am leaning towards removing the stored field, but I think this requires further discussion, because it is going to be very difficult to go back to relying on the order if we stop storing the field. We said we won't focus for now on the ignored reason, and I would add that we should try not to have positional logic around it when we do so. Could you make the changes necessary to retrieve the field from doc values and stop storing the field in the meantime?

Can you provide an advice of how to create a test for that? EDIT: I proposed a test scenario, let me know whether you agree that it covers what we are looking for

I am not sure that we need to test for that. We know we are going to have partial failures in that case.

eyalkoren · 2023-11-26T17:49:48Z

Thanks for the feedback @javanna 🙏

Could you make the changes necessary to retrieve the field from doc values and stop storing the field in the meantime?

I will give it a try 🙂

I am not sure that we need to test for that. We know we are going to have partial failures in that case.

OK, then I guess the next question is - what do we do about that? Only document it as a caveat related to aggregating on existing data streams? Something else?

felixbarny · 2023-11-28T08:11:18Z

OK, then I guess the next question is - what do we do about that? Only document it as a caveat related to aggregating on existing data streams? Something else?

IMO, it would be enough to document that in the same place where you document that the field is supports aggregations now.

eyalkoren · 2023-11-29T16:53:24Z

@javanna the current state is non-functional, but I pushed so you can see what I have done so far.
I'd be happy for advice on how I can debug the fetch phase and specifically the invocation of valueFetcher(). I didn't find a unit test that invokes it yet...

What I know is that the FetchDocValuesContext that comes from the FetchContext is null and that the doc values list that comes from the SearchSourceBuilder is also null. Not sure if this is any help...

salvatore-campagna · 2024-04-23T12:22:30Z

docs/reference/search/profile.asciidoc

@@ -194,7 +194,7 @@ The API returns the following result:
            "load_source_count": 5
          },
          "debug": {
-            "stored_fields": ["_id", "_ignored", "_routing", "_source"]


This is not rerturned anymore because the field is not stored anymore.

that makes sense, I was expecting this change.

salvatore-campagna · 2024-04-23T12:25:13Z

server/src/internalClusterTest/java/org/elasticsearch/search/source/MetadataFetchingIT.java

@@ -123,12 +123,12 @@ public void testWithIgnored() {
        {
            GetResponse getResponse = client().prepareGet("test", "1").get();
            assertTrue(getResponse.isExists());
-            assertThat(getResponse.getField("_ignored"), nullValue());


This change is not required...will restore it.

javanna

I left a couple of comments, but I think this is very close.

javanna · 2024-04-23T16:10:52Z

docs/reference/search/profile.asciidoc

@@ -194,7 +194,7 @@ The API returns the following result:
            "load_source_count": 5
          },
          "debug": {
-            "stored_fields": ["_id", "_ignored", "_routing", "_source"]


that makes sense, I was expecting this change.

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/get/120_stored_fields_ignored.yml

server/src/main/java/org/elasticsearch/index/get/ShardGetService.java

server/src/internalClusterTest/java/org/elasticsearch/action/termvectors/GetTermVectorsIT.java

This reverts commit ea20632.

salvatore-campagna · 2024-04-24T12:32:21Z

Ideally we should stop using skip in yaml tests for things that require versions above 8.14 and move to using cluster features. Anyway I see there are more yaml tests still using skip and I would prefer to handle those in a separate PR so that we apply changes to all yaml tests where that is required and avoid adding more stuff to this PR.

javanna

I left a couple questions, mostly around testing. LGTM otherwise! Great work!

javanna · 2024-04-25T08:18:02Z

modules/parent-join/src/yamlRestTest/resources/rest-api-spec/test/30_inner_hits.yml

@@ -140,7 +140,7 @@ profile fetch:
  - gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
  - gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
  - gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
-  - match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
+  - match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }


I wonder if with this change, the skip above needs updating? Isn't it surprising that this test runs? 8.14 returns the _ignored field while 8.15 does not?

I fixed this but the other was ok.

javanna · 2024-04-25T08:19:18Z

...grade/src/javaRestTest/java/org/elasticsearch/upgrades/IgnoredMetaFieldRollingUpgradeIT.java

+import java.util.Locale;
+import java.util.Map;
+
+public class IgnoredMetaFieldRollingUpgradeIT extends ParameterizedRollingUpgradeTestCase {


This is a great test to have!

javanna · 2024-04-25T08:19:52Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/370_profile.yml

@@ -139,7 +139,7 @@ fetch nested source:
  - gt: { profile.shards.0.fetch.breakdown.next_reader: 0 }
  - gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 }
  - gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 }
-  - match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] }
+  - match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] }


Similar to above, I wonder if we need to update the skip above, that's what I would expect.

I think I forgot about this commit that already made that change: 19db490

javanna · 2024-04-25T08:20:55Z

server/src/internalClusterTest/java/org/elasticsearch/index/mapper/IgnoredMetadataFieldIT.java

+        }
+    }
+
+    public void testIgnoredMetadataFieldFetch() {


This is a bit of a duplicate of MetadatFetchingIT#testWithIgnored ? Is it needed?

I missed this comment...I can remove the test in another PR.

thecoop · 2024-05-01T12:13:36Z

server/src/main/java/org/elasticsearch/index/IndexVersions.java

@@ -104,6 +104,7 @@ private static IndexVersion def(int id, Version luceneVersion) {
    public static final IndexVersion UPGRADE_TO_LUCENE_9_10 = def(8_503_00_0, Version.LUCENE_9_10_0);
    public static final IndexVersion TIME_SERIES_ROUTING_HASH_IN_ID = def(8_504_00_0, Version.LUCENE_9_10_0);
    public static final IndexVersion DEFAULT_DENSE_VECTOR_TO_INT8_HNSW = def(8_505_00_0, Version.LUCENE_9_10_0);
+    public static final IndexVersion DOC_VALUES_FOR_IGNORED_META_FIELD = def(8_505_00_1, Version.LUCENE_9_10_0);


@eyalkoren I've just spotted this - this should have incremented the NNN version of the version id, not the patch version. Check the comment right below the version constants for more info.

@salvatore-campagna is the one two brought this to completion, thanks @thecoop for raising it.

Adding aggregations support for the _ignored field

50d1ffb

eyalkoren added >feature :Search/Search Search-related issues that do not fall into other categories v8.12.0 labels Oct 26, 2023

eyalkoren self-assigned this Oct 26, 2023

eyalkoren changed the title ~~Adding aggregations support for the _ignored field~~ Adding aggregations support for the _ignored field Oct 26, 2023

elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Oct 26, 2023

Update docs/changelog/101373.yaml

737fbb2

felixbarny reviewed Oct 26, 2023

View reviewed changes

eyalkoren added 4 commits October 26, 2023 20:00

spotless

ace155a

Merge remote-tracking branch 'eyalkoren/ignore_field_aggs' into ignor…

e698cc1

…e_field_aggs

Adjust tests

5e3f6ec

Merge remote-tracking branch 'upstream/main' into ignore_field_aggs

ef9ce62

eyalkoren mentioned this pull request Oct 31, 2023

Indicate why a field has been _ignored #101153

Open

eyalkoren added 3 commits November 1, 2023 06:38

Add exists query using doc_values

c34dc4b

Improve error message for unsupported aggs

aa9cdec

ohh come on

2e03324

Merge remote-tracking branch 'upstream/main' into ignore_field_aggs

8bc6582

eyalkoren added Team:obs-knowledge Meta label for Observability Knowledge team and removed Team:obs-knowledge Meta label for Observability Knowledge team labels Nov 29, 2023

WIP: making field not stored

873a34f

felixbarny mentioned this pull request Dec 5, 2023

[Dataset quality] Added malformed docs column to table elastic/kibana#172462

Merged

brianseeders removed the v8.12.0 label Dec 6, 2023

salvatore-campagna added 10 commits April 22, 2024 14:31

fix: use public version string

80a7426

fix: remove test for term vectors

ea20632

docs: update version

2ac32f4

fix: update version in reason

3ce075f

fix: remove tests included in another class

f669f9e

fix: error message and _ignored debug stored field

58bb648

fix: remove _ignored

08dd66f

fix: remove SuppressWarnings

b561aab

fix: remove _ignored

98a4c6b

fix: error message in bwc test

897eca0

salvatore-campagna mentioned this pull request Apr 23, 2024

GET api does not return _ignored by default #107750

Open

fix: fetch _ignored field from doc values

241a8ca

salvatore-campagna reviewed Apr 23, 2024

View reviewed changes

salvatore-campagna added 3 commits April 23, 2024 14:27

fix: undo unecessary change

cf06f0c

fix: non-null ingored fields after implementing fetch from doc values

4f1816d

Merge branch 'main' into ignore_field_aggs

3276a82

salvatore-campagna requested a review from javanna April 23, 2024 15:06

javanna reviewed Apr 23, 2024

View reviewed changes

salvatore-campagna added 4 commits April 24, 2024 10:20

Revert "fix: remove test for term vectors"

6825ada

This reverts commit ea20632.

fix: load _ignored from stored field or from doc values

fe0d874

fix: re-enable terms vector test

de02373

Merge branch 'main' into ignore_field_aggs

1175322

javanna approved these changes Apr 25, 2024

View reviewed changes

salvatore-campagna added 3 commits April 29, 2024 11:43

fix: update skip version

e37e1b9

Merge branch 'main' into ignore_field_aggs

bfec2b3

Merge branch 'main' into ignore_field_aggs

4f95463

salvatore-campagna merged commit ee26295 into elastic:main Apr 29, 2024
14 checks passed

thecoop reviewed May 1, 2024

View reviewed changes

salvatore-campagna mentioned this pull request Feb 6, 2025

Count number of documents with at least one ignored field #109146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding aggregations support for the `_ignored` field #101373

Adding aggregations support for the `_ignored` field #101373

eyalkoren commented Oct 26, 2023 •

edited

Loading

github-actions bot commented Oct 26, 2023

elasticsearchmachine commented Oct 26, 2023

felixbarny Oct 26, 2023

eyalkoren Oct 31, 2023

eyalkoren Nov 20, 2023

javanna Nov 24, 2023

eyalkoren commented Nov 20, 2023 •

edited

Loading

javanna commented Nov 24, 2023

eyalkoren commented Nov 26, 2023

felixbarny commented Nov 28, 2023

eyalkoren commented Nov 29, 2023 •

edited

Loading

salvatore-campagna Apr 23, 2024

javanna Apr 23, 2024

salvatore-campagna Apr 23, 2024

javanna left a comment

javanna Apr 23, 2024

salvatore-campagna commented Apr 24, 2024 •

edited

Loading

javanna left a comment

javanna Apr 25, 2024

salvatore-campagna Apr 30, 2024

javanna Apr 25, 2024

javanna Apr 25, 2024

javanna Apr 29, 2024

javanna Apr 25, 2024

salvatore-campagna Apr 30, 2024

thecoop May 1, 2024

javanna May 1, 2024

Adding aggregations support for the _ignored field #101373

Adding aggregations support for the _ignored field #101373

Conversation

eyalkoren commented Oct 26, 2023 • edited Loading

github-actions bot commented Oct 26, 2023

elasticsearchmachine commented Oct 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eyalkoren commented Nov 20, 2023 • edited Loading

javanna commented Nov 24, 2023

eyalkoren commented Nov 26, 2023

felixbarny commented Nov 28, 2023

eyalkoren commented Nov 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvatore-campagna commented Apr 24, 2024 • edited Loading

javanna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Adding aggregations support for the `_ignored` field #101373

Adding aggregations support for the `_ignored` field #101373

eyalkoren commented Oct 26, 2023 •

edited

Loading

eyalkoren commented Nov 20, 2023 •

edited

Loading

eyalkoren commented Nov 29, 2023 •

edited

Loading

salvatore-campagna commented Apr 24, 2024 •

edited

Loading