SoftDeletesRetentionMergePolicy#numDeletesToMerge caused indexing backlogged #75675
Labels
>bug
:Distributed Indexing/Engine
Anything around managing Lucene and the Translog in an open shard.
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
if soft deleted docs is very more, and they are also in retention lease, the numDeletesToMerge function have performance issue
for instance,an update indexing is writing to elasticsearch, then we move one a primary shard to an other node,If the moving continues for a long time, the size of old shard will become very big, because soft-deleted operations need to held by retention lease. The more soft-deleted documents, the slower the indexing. if the shard size is about 20GB, we can get the below flamegraph
flamegraph.html.zip
In this case, the write queue will be backlog persists, and we can get the jstack below:
1.txt
and the indices stats:
The
_cat/shards/
(when relocating is done, the shard size will be reduced to the same size as other shards)In #35594 , a cache add for numDeletesToMerge, i backport this pr, and re-run in my test case, the issue is resolved
@s1monw I think the PR can be reconsidered
my elasticsearch version: 7.6.2 with LUCENE-9228 backport
The text was updated successfully, but these errors were encountered: