-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop enforcing arbitrary high value for sysctl vm.max_map_count #92618
Comments
For context, this issue is forked from the Lucene user list where a user cannot set the |
In my opinion, the two main factors that contribute to high numbers of map regions are:
Having to change the
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Hi Adrien,
I do not agree with this. It takes quite a long time until this happens and most users won't be affected by this. I would just print the warning on startup or maybe complain at runtime if the system figures out that you have many indexes with many files open. If somebody hits this, they get an IOException like "disk full" or similar. We no longer have the bad "OutOfMemoryError" mentioned since long time, so the exception should not confuse anybody. The Excaption also gives a good message what needs to be done or what would cause this. Lucene's commits are working well, so the node may stop working (but this can also happen when disk space is full). Just stop node, change setting in your sysctl and start it again. It should replay the transactions and all is fine again. Stuff like this happened very often to me, so I am glad that Elasticsearch works well on those error conditions. In short: My problem is only that you can't start the node at all when you do not have the setting applied. My only wish is to instead print a warning and maybe show the warning also in Kibana, but refusing to start is a no-go. The same applied in the past to settings like enforcing "swappinness=0", although everybody around me suggests to use "swappinness=1" or disable swap file at all. If the user wants to configure the cluser in his own way, let him do it. Enforcing users to do something just to not shoot them into the foot is fine to a certain degree, but there must always be a way around checks. I patched that startup error out of an older Elasticsearch cluster used for library search with only a few indexes, because I don't agree to apply those settings on my kernel, sorry. |
BTW, here is the code that throws a nice IOException when the bad thing happens: https://github.com/apache/lucene/blob/19cc6cdf6671a6bdf8dbef66303b6b659574ba05/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java#L307-L348 |
For reference, it is still possible to set |
Lucene's `TieredMergePolicy`'s default consists of creating compound files for segments that exceed 10% of the total index size at the time of writing these segments. This sort of default makes sense in the context of a single Lucene index, but much less for a system like Elasticsearch where a node may host heterogeneous shards: it doesn't make sense to decide to use a coumpound file for a 1GB segment on one shard because it has 30GB of data, and not on another shard because it has 5GB of data. This change does the following: - `index.compound_format` now accepts byte size values, in addition to a `boolean` or ratio in [0,1]. - The default value of `index.compound_format` changes from `0.1` to `1gb`. Said otherwise, segments will be stored in compound files if they don't exceed 1GB. In practice, this means that smaller shards will use compound files much more and contribute less to the total number of open files or map regions, while the bigger shards may use compound files a bit less. Relates elastic#92618
Lucene's `TieredMergePolicy`'s default consists of creating compound files for segments that exceed 10% of the total index size at the time of writing these segments. This sort of default makes sense in the context of a single Lucene index, but much less for a system like Elasticsearch where a node may host heterogeneous shards: it doesn't make sense to decide to use a coumpound file for a 1GB segment on one shard because it has 30GB of data, and not on another shard because it has 5GB of data. This change does the following: - `index.compound_format` now accepts byte size values, in addition to a `boolean` or ratio in [0,1]. - The default value of `index.compound_format` changes from `0.1` to `1gb`. Said otherwise, segments will be stored in compound files if they don't exceed 1GB. In practice, this means that smaller shards will use compound files much more and contribute less to the total number of open files or map regions, while the bigger shards may use compound files a bit less. Relates #92618
uschindler's proposal is a good idea. Give users the right to choose, not be forced. |
I ended up going with what @jpountz described and it (8.10.1) works, what are the downsides of that? from my own research I don't see anything major other than different way of working, but why one is the default instead of the other? why isn't NIOFSDirectory the default? |
Elasticsearch Version
any
Installed Plugins
n/a
Java Version
bundled
OS Version
Linux
Problem Description
When setting up a cluster Elasticsearch requires to set
vm.map_max_count
to an arbitrary high value. This conusmes kernel resources and is totally useless for normal installations with a few hundred shards per node.We know that there is currently a limit that Lucene needs to split maps into chunks of 1 GiB, but also with that limitation the Linux default of 65530 is way enough for most installations. If somebody hits the limit, it is easy to change, but enforcing that strict and by default is.... sorry: bullshit. I have to say it hard!
Instead please enable Java 19 by default (already done) and use the JVM option
--enable-preview
(before Lucene 9.5, see #90526). With Lucene 9.5 we will use 16 GiB chunks by default and without preview mode so having that setting enforced is useless anyways.If you still want to print a warning, I am fine, but letting users not go into production without this obviously wrong setting enforced as a wrkaround for some seldom problem is a desaster!
Steps to Reproduce
Start Elasticsearch in cloud mode without setting
vm.max_map_count
in sysctl.Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: