Deprecate blockstore-processor for --block-verification-method #4728
+12
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Unified scheduler has proven itself to be much more performant, and with future plans to increase CU's, that gap will likely widen.
Summary of Changes
Unified scheduler is already the default, but make blockstore-processor emit a warning message that it will be completely removed in the future
Data
Much of this data is probably redundant to data that Ryo originally provided, but it doesn't hurt to measure again with recent data before we start phasing
blockstore-processor
out.Startup Take 1
Here is one restart against MNB.
--block-verification-method unified-scheduler
--block-verification-method blockstore-processor
Each node came online ~1k slots back;
unified
caught up in ~6 minutes whereasblockstore-processor
took ~22 minutes.Startup Take 2
I switched which node was running which method, and restarted the nodes after an intentional 5 minute delay between process stop and process restart.
![image](https://private-user-images.githubusercontent.com/5400107/408750601-b13868bb-b246-4768-817e-e1bd73b8882e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1OTY3MDgsIm5iZiI6MTczOTU5NjQwOCwicGF0aCI6Ii81NDAwMTA3LzQwODc1MDYwMS1iMTM4NjhiYi1iMjQ2LTQ3NjgtODE3ZS1lMWJkNzNiODg4MmUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTVUMDUxMzI4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NWUxMGE3YzRmM2E1OTg1N2JkZGUwM2QxNDAyN2RiMDg5YmI4OTUzMTdkZjk2NjliZGZjNjY4YTk1MGI2NTI2ZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.NwKgxUzyhMYw8CssUqzq0uT6lgZ3373xNMA5YtqQF-o)
Each node was 1400-1500 slots back;
unified
caught up in ~9 minutes whereasblockstore-processor
required ~28 minutesSteady State
For the next two graphs, these nodes are running as follows:
blockstore-processor
unified-scheduler
One metric for consideration is
![image](https://private-user-images.githubusercontent.com/5400107/413174417-57fc2215-6373-4a7e-aa36-d1ee0d06b116.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1OTY3MDgsIm5iZiI6MTczOTU5NjQwOCwicGF0aCI6Ii81NDAwMTA3LzQxMzE3NDQxNy01N2ZjMjIxNS02MzczLTRhN2UtYWEzNi1kMWVlMGQwNmIxMTYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTVUMDUxMzI4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGZjY2E2ZWUxNWM5Y2U1ODRkZWY4ZTRmMDljY2I1YjRlNzMyZmRiOTU2YTJhNzYyMTQ3OGYxZjJmMzgxYWIxNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.U1CAc4TwvfaA9yULmFyZUaxVVOsO4c_u-FV3rUaUg6Y)
replay-slot-stats.replay_total_elapsed
which is the total time from bank creation to bank frozen. In this metric,unified
shows an advantage by a few %. Note that the delivery of shreds will somewhat limit how much better one method can be than another; we see unified have a clear advantage at startup when the whole block is ready:Another metric for consideration is
![image](https://private-user-images.githubusercontent.com/5400107/413175332-524bac65-8994-4aee-832d-2ddfe2f90fd6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1OTY3MDgsIm5iZiI6MTczOTU5NjQwOCwicGF0aCI6Ii81NDAwMTA3LzQxMzE3NTMzMi01MjRiYWM2NS04OTk0LTRhZWUtODMyZC0yZGRmZTJmOTBmZDYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTVUMDUxMzI4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MDg2YzA3NTJjYTkyNmEzYzBmZGM2OGYyYTBhODc2MDFiMTY2NGQyMzRkYTljNDQwNzA4MTgyZjdlM2RkOWVlNiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.4X3Jg01--SKylfBOEFZg9Ni3So8gbxAgyMXRhBsVyU0)
replay-slot-stats.execute_us
, which is the total amount of time spent executing transactions across all threads. We see a non-trivial difference here whereblockstore-processor
is using less total CPU time. Given that these nodes are replaying the same transactions, that meansblockstore-processor
is doing so more efficiently:Ultimately, I think we care most about completing replay as fast as possible, and 25% more CPU time is acceptable. This final chart shows
![image](https://private-user-images.githubusercontent.com/5400107/413176558-5dd156d3-04f7-4dc4-8cc9-648d79cdfc81.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1OTY3MDgsIm5iZiI6MTczOTU5NjQwOCwicGF0aCI6Ii81NDAwMTA3LzQxMzE3NjU1OC01ZGQxNTZkMy0wNGY3LTRkYzQtOGNjOS02NDhkNzljZGZjODEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTVUMDUxMzI4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGFkYmZjYzc2ZmJlODcwMWZmNDI0ZGVlM2IxYzFmZDMyZDUyY2ZmOWYzY2ZjZWY0ZTExYTA1ZDcwNmMwYTY3YyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.chrFXgc8JabLeLL3ubJy9x6ed8EklmmQyHMN7GXmo5I)
replay-loop-timing-stats.wait_receive_elapsed_us
. In this graph, we can see thatunified
is spending much more time waiting for shreds. This means that it is scheduling replay of transactions much quicker and waiting to be fed more. This isn't quite an apples-to-apples comparison given thatunified
schedules transactions for replay in a non-blocking, asynchronous manner: