-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add setting to ignore dynamic fields when field limit is reached #96235
Add setting to ignore dynamic fields when field limit is reached #96235
Conversation
Documentation preview: |
Hi @felixbarny, I've created a changelog YAML for you. |
Pinging @elastic/es-search (Team:Search) |
…g update failures once
server/src/main/java/org/elasticsearch/action/bulk/BulkPrimaryExecutionContext.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Outdated
Show resolved
Hide resolved
I like the general approach here. It makes it clear, the limit is for dynamic mappings. If the overall limit is 1024 and the user has specified 1020 fields, 4 are open for dynamic fields. Every time an index rolls over, the game begins agains. If a user by accident created too many fields in an index under a data stream but then has fixed ingestion, a rollover resets the counter for the new index. For the futureI'm starting to wonder, if |
Makes sense to me. This has also been discussed in #89911 |
server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java
Outdated
Show resolved
Hide resolved
When we discussed this last with the team we talked about introducing a new dynamic mode rather than a new index settings that affects how existing dynamic modes work. Also, we said we'd want to better understand how users are going to consume the additional info added to the |
We are one of the main requestors of this feature. For us, this incremental change solves the problem for now. I agree, eventually there should be a more holistic overhaul on how all the pieces work together but this should not block the addition of this config option, especially as it is rather small change.
The tradeoff we are making here is, dropping documents vs having the info available. We should not drop documents and optimise for this first. We can work in a second step to make it easier to retrieve to the user. |
Agree but that's a problem that already exists today (for example, is I don't see why we should block progress on this until we have a way to store the _ignored reason.
Again, this is orthogonal and should be handled via #59946 |
@javanna do you agree on the approach of just relying on
I've tried that, too in this PR: #96233 |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Code has changed considerably since it was reviewed, another round needed.
This is in preparation of elastic#96235. At the moment, there's no difference between MAPPING_AUTO_UPDATE and MAPPING_AUTO_UPDATE_PREFLIGHT. After the other PR is merged, when the merge reason is auto-update and if ignore_dynamic_beyond_limit is set, the merge process will only add dynamically mapped fields until the field limit is reached and ignores additional ones.
This is in preparation of #96235. At the moment, there's no difference between MAPPING_AUTO_UPDATE and MAPPING_AUTO_UPDATE_PREFLIGHT. After the other PR is merged, when the merge reason is auto-update and if ignore_dynamic_beyond_limit is set, the merge process will only add dynamically mapped fields until the field limit is reached and ignores additional ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, this is not far at all!
docs/reference/troubleshooting/common-issues/mapping-explosion.asciidoc
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/IndexSettings.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/IndexSettings.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/MappingLookup.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/index/mapper/DynamicMappingIT.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did another round. The main bit that's left open is how we expose the new behaviour I will get back to you as soon as that's discussed and decided with the team.
server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/MapperService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@elasticmachine update branch |
Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit. This means that some field types, such as `search_as_you_type` or `percolator` count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping. This change makes it so that each field mapper only counts as one. We're still counting multi-fields. This makes it easier to understand for users why the field limit is hit. ~In addition to that, it also simplifies #96235 as it makes the implementation of `Mapper.Builder#getTotalFieldsCount` much easier and easier to align with `Mapper#getTotalFieldsCount`. This reduces the risk of over- or under-estimating the field count of a `Mapper.Builder` in `DocumentParserContext#addDynamicMapper`, which in turn reduces the risk of data loss due to the issue described here: #96235 (comment) *Edit: due to #103865, we don't need an implementation of `getTotalFieldsCount` or `mapperSize` in `Mapper.Builder`. Still, this PR more closely aligns `Mapper#getTotalFieldsCount` with `MappingLookup#getTotalFieldsCount`, which `DocumentParserContext#addDynamicMapper` uses to determine whether the field limit is hit* A potential risk of this is that we're now effectively allowing more fields in the mapping. It may be surprising to users that more fields can be added to a mapping. Although, I'd not expect negative consequences from that. Generally, I'd expect users to be happy about any change that reduces the risk of data loss. We could also think about whether to apply the new counting logic only to new indices (depending on the `IndexVersion`). However, that would add more complexity and I'm not convinced about the value. We'd then need to maintain two different ways of counting fields and also require passing in the `IndexVersion` to `MappingLookup` which previously didn't require the `IndexVersion`. This PR is meant as a conversation starter. It would also simplify #96235 but I don't think this blocks that PR in any way. I'm curious about the opinion of @javanna and @jpountz on this.
Adds a new
index.mapping.total_fields.ignore_dynamic_beyond_limit
index setting.When set to
true
, new fields are added to the mapping as long as the field limit (index.mapping.total_fields.limit
) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar todynamic: false
. Ignored fields are added to the_ignored
metadata field.Relates to #89911
To make this easier to review, this is split into the following PRs:
Related but not a prerequisite: