Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoftDeletesRetentionMergePolicy#numDeletesToMerge caused indexing backlogged #75675

Open
easyice opened this issue Jul 26, 2021 · 1 comment
Open
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@easyice
Copy link
Contributor

easyice commented Jul 26, 2021

if soft deleted docs is very more, and they are also in retention lease, the numDeletesToMerge function have performance issue

for instance,an update indexing is writing to elasticsearch, then we move one a primary shard to an other node,If the moving continues for a long time, the size of old shard will become very big, because soft-deleted operations need to held by retention lease. The more soft-deleted documents, the slower the indexing. if the shard size is about 20GB, we can get the below flamegraph

image

flamegraph.html.zip

In this case, the write queue will be backlog persists, and we can get the jstack below:

1.txt

and the indices stats:

health status index                               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myindex-2021.07.26 wH2C74XtRRaO8O3KBrLu1A   6   0   73873732     19975538     76.3gb         76.3gb

The _cat/shards/ (when relocating is done, the shard size will be reduced to the same size as other shards)

image

In #35594 , a cache add for numDeletesToMerge, i backport this pr, and re-run in my test case, the issue is resolved

@s1monw I think the PR can be reconsidered

my elasticsearch version: 7.6.2 with LUCENE-9228 backport

@easyice easyice added >bug needs:triage Requires assignment of a team area label labels Jul 26, 2021
@DJRickyB DJRickyB added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed needs:triage Requires assignment of a team area label labels Jul 26, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

3 participants