Reduce DeletesMerges time when softdelete enable #12350
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem statement
we found when Lucene using in
frequently update
ORupdate by query
scenarios. it will do many iteration in the following code:lucene/lucene/core/src/java/org/apache/lucene/index/SoftDeletesRetentionMergePolicy.java
Lines 166 to 176 in 0c29390
Because
SoftDeletesRetentionMergePolicy
need query withretentionQuerySupplier
AND then filter the retention documents. it is time consuming to iterator docid in frequently updates scenariosthere is flame graph:
we tracing the stack:
it will be called from the stack in update documents:
lucene/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
Lines 5891 to 5899 in 0c29390
and will be called from the stack in merge:
lucene/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
Lines 2346 to 2361 in 0c29390
Proposal
there is some optimize to reduce the number of calling
numDeletesToMerge
:feat: soft delete optimize #12339 try to reduce in
getSortedBySegmentSize
when me do merge before we call:
getSortedBySegmentSize
, and it will duplicate calculatenumDeletesToMerge
this pr try to reduce in
findForcedDeletesMerges
when we try to find delete size, it will duplicate calculate
numDeletesToMerge
In our scenarios,
numDeletesToMerge
calling make the write latency strike increased, becauseupdatePendingMerges
is asynchronized
method. we can reduce duplicate calculation