Fix the IndexError in _filter_operational_batches_sequence
caused by a race condition in parallel sequencing batches and pruning old batches.
#83
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The following error has been observed when using batch aggregator proxies with four workers and a large number of batches (e.g., 1,000,000) sent concurrently:
The source of the issue is a race condition between sequencing new batches and pruning old finalized batches. Consider the following situation:
zsequencer/common/db.py
Lines 248 to 254 in 62cc9a4
zsequencer/common/db.py
Lines 376 to 412 in 62cc9a4
last_sequenced_batch
field points to a non-existent batch.last_sequenced_batch
value is then used to retrieve the index of the latest sequenced batch, but since it points to a batch that does not exist, a gap is created in the operational batches sequence. This results in anIndexError
when attempting to access the operational batches sequence by index.I've fixed the issue by implementing a lazy evaluation approach when slicing the operational batches sequence based on intervals. Specifically, when a caller of
_filter_operational_batches_sequence
passes an interval withportion.inf
or-portion.inf
(or any value ≤ 1), it indicates that the caller wants all batches up to the end (forportion.inf
) or from the start (for-portion.inf
or values ≤ 1).Previously, I intersected those bounds with the operational batches sequence's max and min indices. Now, I convert the interval into a Python built-in slice (with support for infinite bounds) and use that to slice the operational batches sequence. In this way, the following line indicates a range from the specified border to the end of the operational batches sequence and is evaluated exactly as written:
zsequencer/common/db.py
Lines 248 to 254 in 62cc9a4