Deprecate blockstore-processor for --block-verification-method #4728

steviez · 2025-01-31T18:57:11Z

Problem

Unified scheduler has proven itself to be much more performant, and with future plans to increase CU's, that gap will likely widen.

Summary of Changes

Unified scheduler is already the default, but make blockstore-processor emit a warning message that it will be completely removed in the future

Data

Much of this data is probably redundant to data that Ryo originally provided, but it doesn't hurt to measure again with recent data before we start phasing blockstore-processor out.

Startup Take 1

Here is one restart against MNB.

Blue is a control node
Purple is running --block-verification-method unified-scheduler
Orange is running --block-verification-method blockstore-processor
Purple and orange have same server config in same DC, so about as close as we can reasonably get to "same hardware"

Each node came online ~1k slots back; unified caught up in ~6 minutes whereas blockstore-processor took ~22 minutes.

Startup Take 2

I switched which node was running which method, and restarted the nodes after an intentional 5 minute delay between process stop and process restart.

Each node was 1400-1500 slots back; unified caught up in ~9 minutes whereas blockstore-processor required ~28 minutes

Steady State

For the next two graphs, these nodes are running as follows:

Blue is blockstore-processor
Purple is unified-scheduler

One metric for consideration is replay-slot-stats.replay_total_elapsed which is the total time from bank creation to bank frozen. In this metric, unified shows an advantage by a few %. Note that the delivery of shreds will somewhat limit how much better one method can be than another; we see unified have a clear advantage at startup when the whole block is ready:

Another metric for consideration is replay-slot-stats.execute_us, which is the total amount of time spent executing transactions across all threads. We see a non-trivial difference here where blockstore-processor is using less total CPU time. Given that these nodes are replaying the same transactions, that means blockstore-processor is doing so more efficiently:

Ultimately, I think we care most about completing replay as fast as possible, and 25% more CPU time is acceptable. This final chart shows replay-loop-timing-stats.wait_receive_elapsed_us. In this graph, we can see that unified is spending much more time waiting for shreds. This means that it is scheduling replay of transactions much quicker and waiting to be fed more. This isn't quite an apples-to-apples comparison given that unified schedules transactions for replay in a non-blocking, asynchronous manner:

apfitzge

LGTM - deprecation is the first step to removal. The new approach is much faster at catching up, there's no real reason to use the old approach.

steviez force-pushed the val_deprecate_blockstore_processor branch from 939b0fd to 3896364 Compare January 31, 2025 19:13

apfitzge previously approved these changes Feb 6, 2025

View reviewed changes

validator: Deprecate BlockVerificationMethod::BlockstoreProcessor

685ad90

steviez dismissed apfitzge’s stale review via 685ad90 February 13, 2025 21:13

steviez force-pushed the val_deprecate_blockstore_processor branch from 3896364 to 685ad90 Compare February 13, 2025 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate blockstore-processor for --block-verification-method #4728

Deprecate blockstore-processor for --block-verification-method #4728

steviez commented Jan 31, 2025 •

edited

Loading

apfitzge left a comment

Deprecate blockstore-processor for --block-verification-method #4728

Are you sure you want to change the base?

Deprecate blockstore-processor for --block-verification-method #4728

Conversation

steviez commented Jan 31, 2025 • edited Loading

Problem

Summary of Changes

Data

Startup Take 1

Startup Take 2

Steady State

apfitzge left a comment

Choose a reason for hiding this comment

steviez commented Jan 31, 2025 •

edited

Loading