Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch 6.2.2 nodes crash after reaching ulimit setting #29097

Closed
redlus opened this issue Mar 15, 2018 · 17 comments · Fixed by #29125
Closed

Elasticsearch 6.2.2 nodes crash after reaching ulimit setting #29097

redlus opened this issue Mar 15, 2018 · 17 comments · Fixed by #29125
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard.

Comments

@redlus
Copy link

redlus commented Mar 15, 2018

Hi,

We're experiencing a critical production issue in elasticsearch 6.2.2 related to open_file_descriptors. The cluster is an exact replica (as much as possible) of a 5.2.2 cluster, and documents are indexed into both clusters in parallel.
While the indexing performance of the new cluster seems to be at least as good as the 5.2.2 cluster, the new nodes open_file_descriptors is reaching record-breaking levels (especially when compared to v5.2.2).

Elasticsearch version
6.2.2

Plugins installed:
ingest-attachment
ingest-geoip
mapper-murmur3
mapper-size
repository-azure
repository-gcs
repository-s3

JVM version:
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

OS version:
Linux 4.13.0-1011-azure #14-Ubuntu SMP 2018 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

All machines have ulimit of 65536, as recommended by the official documentation.
All nodes on the v5.2.2 cluster have up to 4,500 open_file_descripors, while the new v6.2.2 nodes are divided: some have up to 4,500 open_file_descriptors, while others consistently open more and more file descriptors until reaching the limit and crashing with java.nio.file.FileSystemException: Too many open files -

[WARN ][o.e.c.a.s.ShardStateAction] [prod-elasticsearch-master-002] [newlogs_20180315-01][0] received shard failed for shard id [[newlogs_20180315-01][0]], allocation id [G8NGOPNHRNuqNKYKzfiPcg], primary term [0], message [shard failure, reason [already closed by tragic event on the translog]], failure [FileSystemException[/mnt/nodes/0/indices/nIgarkzwRwe0DmT-nmLhvg/0/translog/translog.ckp: Too many open files]]
java.nio.file.FileSystemException: /mnt/nodes/0/indices/nIgarkzwRwe0DmT-nmLhvg/0/translog/translog.ckp: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[?:1.8.0_151]
at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[?:1.8.0_151]
at org.elasticsearch.index.translog.Checkpoint.write(Checkpoint.java:179) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.translog.TranslogWriter.writeCheckpoint(TranslogWriter.java:419) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.translog.TranslogWriter.syncUpTo(TranslogWriter.java:362) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.translog.Translog.ensureSynced(Translog.java:699) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.translog.Translog.ensureSynced(Translog.java:724) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard$3.write(IndexShard.java:2336) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:107) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcess(AsyncIOProcessor.java:99) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:82) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard.sync(IndexShard.java:2358) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportWriteAction$AsyncAfterWriteAction.run(TransportWriteAction.java:362) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportWriteAction$WritePrimaryResult.(TransportWriteAction.java:160) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:133) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:110) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:72) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1034) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1012) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:103) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:359) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:299) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:975) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:972) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:238) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2220) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:984) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction.access$500(TransportReplicationAction.java:98) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:320) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:295) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:282) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.2.jar:6.2.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

After this exception, some of the nodes throw many exceptions and reduce the number of open file descriptors. Other times they just crash. The issue repeats itself with interleaving nodes.

Steps to reproduce:
None yet. This is a production-scale issue, so it is not easy to reproduce.

I'd be happy to provide additional details, whatever is needed.

Thanks!

@jasontedor
Copy link
Member

Would you please take the PID of an Elasticsearch process and as the number of open file descriptors grows capture the output of ls -al /proc/<PID>/fd and send the output to me privately (first name at elastic.co)? You might want to repeat this a few times as it grows and grows so we can diff and easily see what it is growing on.

When you say the node is crashing, do you have logs for this?

@jasontedor jasontedor added the :Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. label Mar 16, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@redlus
Copy link
Author

redlus commented Mar 16, 2018

Hi Jason,

I've sent you several files with the output of ls -al /proc/<PID>/fd on timestamps along 17 minutes, during which the open file descriptors increased from ~99k to ~105k.

Nodes reaching the ulimit actually crash the process. Here are the last few logs of a node which crashed:

[2018-03-14T03:27:47,930][WARN ][o.e.i.e.Engine ] [prod-elasticsearch-data-028] [newlogs_20180314-01][1] failed to read latest segment infos on flush
java.nio.file.FileSystemException: /mnt/nodes/0/indices/-mwGJUcDRXKRtezeXPaZSA/1/index: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427) ~[?:?]
at java.nio.file.Files.newDirectoryStream(Files.java:457) ~[?:1.8.0_151]
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:215) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:234) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:655) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:627) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:123) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:202) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:187) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.engine.InternalEngine.refreshLastCommittedSegmentInfos(InternalEngine.java:1599) [elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1570) [elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1030) [elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard$4.doRun(IndexShard.java:2403) [elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.2.jar:6.2.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-03-14T03:27:48,040][WARN ][o.e.i.s.IndexShard ] [prod-elasticsearch-data-028] [newlogs_20180314-01][1] failed to flush index
org.elasticsearch.index.engine.FlushFailedEngineException: Flush failed
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1568) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1030) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard$4.doRun(IndexShard.java:2403) [elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.2.jar:6.2.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.nio.file.FileSystemException: /mnt/nodes/0/indices/-mwGJUcDRXKRtezeXPaZSA/1/translog/translog-24471.ckp: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixCopyFile.copyFile(UnixCopyFile.java:243) ~[?:?]
at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:581) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253) ~[?:?]
at java.nio.file.Files.copy(Files.java:1274) ~[?:1.8.0_151]
at org.elasticsearch.index.translog.Translog.rollGeneration(Translog.java:1539) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1560) ~[elasticsearch-6.2.2.jar:6.2.2]
... 7 more
[2018-03-14T03:27:48,398][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [prod-elasticsearch-data-028] fatal error in thread [elasticsearch[prod-elasticsearch-data-028][bulk][T#2]], exiting
java.lang.AssertionError: Unexpected AlreadyClosedException
at org.elasticsearch.index.engine.InternalEngine.failOnTragicEvent(InternalEngine.java:1779) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.engine.InternalEngine.maybeFailEngine(InternalEngine.java:1796) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:904) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:738) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:707) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnReplica(IndexShard.java:680) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOpOnReplica(TransportShardBulkAction.java:518) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnReplica(TransportShardBulkAction.java:480) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:466) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:72) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:567) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:530) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard$2.onResponse(IndexShard.java:2314) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard$2.onResponse(IndexShard.java:2292) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:238) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.shard.IndexShard.acquireReplicaOperationPermit(IndexShard.java:2291) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:641) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:513) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:493) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.2.jar:6.2.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: org.apache.lucene.store.AlreadyClosedException: translog is already closed
at org.elasticsearch.index.translog.Translog.ensureOpen(Translog.java:1663) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.translog.Translog.add(Translog.java:504) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:886) ~[elasticsearch-6.2.2.jar:6.2.2]
... 24 more

Could this be related to #25005? https://github.com/elastic/elasticsearch/pull/25005

@farin99
Copy link

farin99 commented Mar 16, 2018

from running: sudo lsof -p <es process>
90% of the file descriptors are from translog-xxxxx.tlog files. Seems like elastic never deletes these files

@bleskes
Copy link
Contributor

bleskes commented Mar 16, 2018

@farin99 can you describe how many active shards you have on each nodes - i.e., how many shards receive index operations in a 12 hours period ?

@ArielCoralogix
Copy link

Hey @bleskes
Answering for @farin99 here - we have a few hundreds of active shards per node.

@amnons77
Copy link

Hey @bleskes,

  1. We have about 2600 shards that received an index operations in a 24H period.
  2. We tried to change the "index.translog.retention.age" to 30m for the indexes who holds that shards and it didn't reduced the amount of "open_file_descriptors".

@amnons77
Copy link

@bleskes
Sorry, my previous comment described the number of open shards on this entire cluster. We have about 100 open shards per node per 12 hours.
Btw, this same setup works in parallel now (data fan out) on version 5.2 with no issues and about 5000 open file descriptors while the 6.2 cluster already exceeded 150,000 for some of the nodes.

@bleskes
Copy link
Contributor

bleskes commented Mar 16, 2018

@amnons77 ok, if trimming the retention policy doesn't help we need figure out what's going on. Let's takes this to the forums to sort out. Can you post a message, including the output of nodes stats on http://discuss.elastic.co and link it here? once we know what it is we can, if needed, come back to github

@ArielCoralogix
Copy link

Hey @bleskes I am referring to the ticket opened by our colleague @redlus on the Discuss forum: https://discuss.elastic.co/t/elasticsearch-6-2-2-nodes-crash-after-reaching-ulimit-setting/124172
Also, Lior ran 'GET /_stats?include_segment_file_sizes&level=shards' and send the result files from 3 different times to @nhat which is also active on that thread.

@jasontedor
Copy link
Member

The translog for four of your shards (from three distinct indices) is behaving in an unanticipated way. Could you please share the index settings for:
DN_QfOruR5SMVRavPDnRfw
rRrAxVZzSye4W_-exBocCA
r-cR4xWkSFinkKg_RqiDsA

(from /{index}/_settings). I do not know the name of the indices that these correspond to, you will have to backtrace.

And can you also share the shard stats for these indices (from /{index}/_stats?level=shards&include_segment_file_sizes=true). You can email them to the same email address that you used before.

dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Mar 17, 2018
In elastic#28350, we fixed an endless flushing loop which can happen on
replicas by tightening the relation between the flush action and the
periodically flush condition.

1. The periodically flush condition is enabled only if it will be
disabled after a flush.

2. If the periodically flush condition is true then a flush will
actually happen regardless of Lucene state.

(1) and (2) guarantee a flushing loop will be terminated. Sadly, the
condition elastic#1 can be violated in edge cases as we used two different
algorithms to evaluate the current and future uncommitted size.

- We use method `uncommittedSizeInBytes` to calculate current
uncommitted size. It is the sum of translogs whose generation at least
the minGen (determined by a given seqno). We pick a continuous range of
translogs since the minGen to evaluate the current uncommitted size.

- We use method `sizeOfGensAboveSeqNoInBytes` to calculate the future
uncommitted size. It is the sum of translogs whose maxSeqNo at least
the given seqNo. Here we don't pick a range but select translog one
by one.

Suppose we have 3 translogs gen1={elastic#1,elastic#2}, gen2={}, gen3={elastic#3} and
seqno=elastic#1, uncommittedSizeInBytes is the sum of gen1, gen2, and gen3
while sizeOfGensAboveSeqNoInBytes is sum of gen1 and gen3. Gen2 is
excluded because its maxSeqno is still -1.

This commit ensures sizeOfGensAboveSeqNoInBytes use the same algorithm
from uncommittedSizeInBytes

Closes elastic#29097
@dnhatn
Copy link
Member

dnhatn commented Mar 17, 2018

I opened #29125 to propose a fix for this.

@dnhatn
Copy link
Member

dnhatn commented Mar 17, 2018

The problem is that one replica got into an endless loop of flushing.

@dnhatn dnhatn added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. labels Mar 17, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@redlus
Copy link
Author

redlus commented Mar 22, 2018

Hey @dnhatn @bleskes and @jasontedor,

We can confirm this affects v6.0.2 as well:
image

At 09:22 the node enters a state where the open file descriptors monotonically increase.
This is accompanied by higher load average, cpu, total indices actions and indices operation time.

dnhatn added a commit that referenced this issue Mar 22, 2018
In #28350, we fixed an endless flushing loop which may happen on 
replicas by tightening the relation between the flush action and the
periodically flush condition.

1. The periodically flush condition is enabled only if it is disabled 
after a flush.

2. If the periodically flush condition is enabled then a flush will
actually happen regardless of Lucene state.

(1) and (2) guarantee that a flushing loop will be terminated. Sadly, 
the condition 1 can be violated in edge cases as we used two different
algorithms to evaluate the current and future uncommitted translog size.

- We use method `uncommittedSizeInBytes` to calculate current 
  uncommitted size. It is the sum of translogs whose generation at least
the minGen (determined by a given seqno). We pick a continuous range of
translogs since the minGen to evaluate the current uncommitted size.

- We use method `sizeOfGensAboveSeqNoInBytes` to calculate the future 
  uncommitted size. It is the sum of translogs whose maxSeqNo at least
the given seqNo. Here we don't pick a range but select translog one by
one.

Suppose we have 3 translogs `gen1={#1,#2}, gen2={}, gen3={#3} and 
seqno=#1`, `uncommittedSizeInBytes` is the sum of gen1, gen2, and gen3
while `sizeOfGensAboveSeqNoInBytes` is the sum of gen1 and gen3. Gen2 is
excluded because its maxSeqno is still -1.

This commit removes both `sizeOfGensAboveSeqNoInBytes` and 
`uncommittedSizeInBytes` methods, then enforces an engine to use only
`sizeInBytesByMinGen` method to evaluate the periodically flush condition.

Closes #29097
Relates ##28350
dnhatn added a commit that referenced this issue Mar 22, 2018
In #28350, we fixed an endless flushing loop which may happen on
replicas by tightening the relation between the flush action and the
periodically flush condition.

1. The periodically flush condition is enabled only if it is disabled
after a flush.

2. If the periodically flush condition is enabled then a flush will
actually happen regardless of Lucene state.

(1) and (2) guarantee that a flushing loop will be terminated. Sadly,
the condition 1 can be violated in edge cases as we used two different
algorithms to evaluate the current and future uncommitted translog size.

- We use method `uncommittedSizeInBytes` to calculate current
  uncommitted size. It is the sum of translogs whose generation at least
the minGen (determined by a given seqno). We pick a continuous range of
translogs since the minGen to evaluate the current uncommitted size.

- We use method `sizeOfGensAboveSeqNoInBytes` to calculate the future
  uncommitted size. It is the sum of translogs whose maxSeqNo at least
the given seqNo. Here we don't pick a range but select translog one by
one.

Suppose we have 3 translogs `gen1={#1,#2}, gen2={}, gen3={#3} and
seqno=#1`, `uncommittedSizeInBytes` is the sum of gen1, gen2, and gen3
while `sizeOfGensAboveSeqNoInBytes` is the sum of gen1 and gen3. Gen2 is
excluded because its maxSeqno is still -1.

This commit removes both `sizeOfGensAboveSeqNoInBytes` and
`uncommittedSizeInBytes` methods, then enforces an engine to use only
`sizeInBytesByMinGen` method to evaluate the periodically flush condition.

Closes #29097
Relates ##28350
dnhatn added a commit that referenced this issue Mar 22, 2018
In #28350, we fixed an endless flushing loop which may happen on
replicas by tightening the relation between the flush action and the
periodically flush condition.

1. The periodically flush condition is enabled only if it is disabled
after a flush.

2. If the periodically flush condition is enabled then a flush will
actually happen regardless of Lucene state.

(1) and (2) guarantee that a flushing loop will be terminated. Sadly,
the condition 1 can be violated in edge cases as we used two different
algorithms to evaluate the current and future uncommitted translog size.

- We use method `uncommittedSizeInBytes` to calculate current
  uncommitted size. It is the sum of translogs whose generation at least
the minGen (determined by a given seqno). We pick a continuous range of
translogs since the minGen to evaluate the current uncommitted size.

- We use method `sizeOfGensAboveSeqNoInBytes` to calculate the future
  uncommitted size. It is the sum of translogs whose maxSeqNo at least
the given seqNo. Here we don't pick a range but select translog one by
one.

Suppose we have 3 translogs `gen1={#1,#2}, gen2={}, gen3={#3} and
seqno=#1`, `uncommittedSizeInBytes` is the sum of gen1, gen2, and gen3
while `sizeOfGensAboveSeqNoInBytes` is the sum of gen1 and gen3. Gen2 is
excluded because its maxSeqno is still -1.

This commit removes both `sizeOfGensAboveSeqNoInBytes` and
`uncommittedSizeInBytes` methods, then enforces an engine to use only
`sizeInBytesByMinGen` method to evaluate the periodically flush condition.

Closes #29097
Relates ##28350
@redlus
Copy link
Author

redlus commented Apr 9, 2018

Commit #29125 changes the mechanism controlling the flush operation, and although it has been committed to the master, the latest elasticsearch version does not include it. We've therefore built v6.2.4-SNAPSHOT from the master and have been running it for the last two weeks in production-scale. The issue seems to have been solved, and everything works as expected:

image

Thank you for the quick response and handling of the issue!

@bleskes
Copy link
Contributor

bleskes commented Apr 9, 2018

@redlus thanks for testing and letting us know. Happy to hear it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants