op-batcher: remove `ThrottleInterval` and split block loading and batch publishing into separate goroutines #14244

geoknee · 2025-02-07T14:07:52Z

This moves the hybrid "event + interval" model for DA throttling to a purely event driven model. There are two events which can trigger throttling: (1) a change in the amount of pending data building up in the batcher and (2) a change in the active sequencer which the batcher is pointing at.

A happy side effect is that the batcher's loading, publishing and throttling workloads are broken up and now work fully asynchronously. Previously the batcher would load, and then publish for up to some maximum amount of time (blocking block loading and throttling), then return to loading again in order to trigger throttling.

Closes op-batcher: Check whether the throttling interval can be removed #12838
Replaces some use of a for / select loop with ctx.Done() branch with a simpler range over a channel. This makes shutdown behaviour easier to reason about and makes deadlocks less likely.
Reduces the use of global variables, preferring local ones (wait groups, contexts, etc)
Readme has a diagram showing relation of the various goroutines and which ones signal to each other
Should help prevent throttling loop interfering with the other loading/publishing jobs of the batcher (i.e. should increase its throughput when under load).
Should reduce the chance that an active sequencer change results in a delay to the throttling signal
Be very interesting to know the impact of this change on batcher throughput. We need some acceptance testing / criteria around this.

TODO

to remove the throttlingLoop config variable
add an "event" for active sequencer change
stress test this batcher using kurtosis and tx fuzzing
prepare release notes / docs on how THROTTLE_INTERVAL is deprecated

codecov · 2025-02-07T14:16:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (7ba4555) to head (06a9228).
Report is 95 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff              @@
##           develop   #14244       +/-   ##
============================================
- Coverage    45.89%        0   -45.90%     
============================================
  Files         1008        0     -1008     
  Lines        86419        0    -86419     
============================================
- Hits         39663        0    -39663     
+ Misses       43758        0    -43758     
+ Partials      2998        0     -2998

Flag	Coverage Δ
cannon-go-tests-32	`?`
cannon-go-tests-64	`?`
contracts-bedrock-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1008 files with indirect coverage changes

op-e2e/system/da/throughput_test.go

* no longer evaluate throttling conditions on a ticker * break main loop into "reading" and "writing" * reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded) * these run concurrently

…er (pausing when it runs out of data) tests pass but timeout due to bad shutdown

no more ticker required

I'm not sure it is adding any value. May return to it in future

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

means that we buffer a single event if the throttlingLoop is busy, even when using a try-send

see previous commit

…ck fn

if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams)

op-batcher/batcher/driver.go

op-batcher/batcher/service.go

op-batcher/batcher/driver.go

op-batcher: improve active-seq-changed signalling setup

…ch publishing into separate goroutines (ethereum-optimism#14244) * op-batcher: overhaul throttling, reading and writing loops * no longer evaluate throttling conditions on a ticker * break main loop into "reading" and "writing" * reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded) * these run concurrently * improvements * readloop unblocks writing loop once, and then writing loop goes forever (pausing when it runs out of data) tests pass but timeout due to bad shutdown * rename to prevent shadowing * allow for receipt handling when receipt and err are both nil * split shutdownCtx into producers and consumers * throttling loop ranges over a channel, does not take a context * processReceiptsLoop ranges over a channel, does not take a context * unify wait groups * unify contexts * writeLoop uses a ticker instead of a sleep * WIP: add throughput test for batcher * make txQueue local, not global state * reduce diff * reduce diff further * make pendingBytesUpdated a local, not global * read and write loop communicate with a channel no more ticker required * rename * rename * abstract promptLoop * rename * update readme * add diagram to readme * remove test I'm not sure it is adding any value. May return to it in future * sendToThrottlingLoop uses mutex * harmonize "*Loop returning" logs * tidy * remove TODO * rip out ThrottleInterval config var * add activeSequencerChanged channel and wire it up to onActiveSequencerChanged hook in active rollup provider * set callback at runtime, not in constructor * reinstate startup order * tidy * remove dead code * remove unintentional change * rename readLoop to blockLoadingLoop and writeLoop to publishingLoop * improve diagram * replace ThrottleInterval with ThrottleThreshold as enabling var * remove onActiveProviderChanged arg from constructors * protect callback invocation with nullity check * lint * Apply suggestions from code review Co-authored-by: Sebastian Stammler <seb@oplabs.co> * remove ThrottelInterval from flags * docs: mention active sequencer signal * only attach callback if throttling is enabled * increase buffer of activeSequencerChanged means that we buffer a single event if the throttlingLoop is busy, even when using a try-send * add buffer to pendingBytesUpdated see previous commit * reduce indentation * remove dangling ref * pass killCtx to publishingLoop and avoid accessing this context globally * remove continue and always signal publishing loop * remove unecessary mutex wrangle * replace signalPublishingLoop method with trySignal fn * extend TestRollupProvider_FailoverOnInactiveSequencer to cover callback fn * use retryTimer in throttlingLoop if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams) * op-batcher: improve active-seq-changed signalling setup * remove unused state var * rename blocksLoaded channel to publishSignal channel --------- Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* op-batcher: prevent `SpanChannelOut` RLP bytes overflowing `MaxRLPBytesPerChannel` (ethereum-optimism#14310) * fix op-batcher pack over MaxRLPBytesChannel * add test cases from different CompressionAlgo * add fresh compression logic and corresponding comments * refactor: enhance MaxRLPBytesPerChannel test * refactor: rename variable and add required messages * op-batcher: use local-safe to reduce lag during interop sync (ethereum-optimism#14265) * op-batcher: remove `ThrottleInterval` and split block loading and batch publishing into separate goroutines (ethereum-optimism#14244) * op-batcher: overhaul throttling, reading and writing loops * no longer evaluate throttling conditions on a ticker * break main loop into "reading" and "writing" * reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded) * these run concurrently * improvements * readloop unblocks writing loop once, and then writing loop goes forever (pausing when it runs out of data) tests pass but timeout due to bad shutdown * rename to prevent shadowing * allow for receipt handling when receipt and err are both nil * split shutdownCtx into producers and consumers * throttling loop ranges over a channel, does not take a context * processReceiptsLoop ranges over a channel, does not take a context * unify wait groups * unify contexts * writeLoop uses a ticker instead of a sleep * WIP: add throughput test for batcher * make txQueue local, not global state * reduce diff * reduce diff further * make pendingBytesUpdated a local, not global * read and write loop communicate with a channel no more ticker required * rename * rename * abstract promptLoop * rename * update readme * add diagram to readme * remove test I'm not sure it is adding any value. May return to it in future * sendToThrottlingLoop uses mutex * harmonize "*Loop returning" logs * tidy * remove TODO * rip out ThrottleInterval config var * add activeSequencerChanged channel and wire it up to onActiveSequencerChanged hook in active rollup provider * set callback at runtime, not in constructor * reinstate startup order * tidy * remove dead code * remove unintentional change * rename readLoop to blockLoadingLoop and writeLoop to publishingLoop * improve diagram * replace ThrottleInterval with ThrottleThreshold as enabling var * remove onActiveProviderChanged arg from constructors * protect callback invocation with nullity check * lint * Apply suggestions from code review Co-authored-by: Sebastian Stammler <seb@oplabs.co> * remove ThrottelInterval from flags * docs: mention active sequencer signal * only attach callback if throttling is enabled * increase buffer of activeSequencerChanged means that we buffer a single event if the throttlingLoop is busy, even when using a try-send * add buffer to pendingBytesUpdated see previous commit * reduce indentation * remove dangling ref * pass killCtx to publishingLoop and avoid accessing this context globally * remove continue and always signal publishing loop * remove unecessary mutex wrangle * replace signalPublishingLoop method with trySignal fn * extend TestRollupProvider_FailoverOnInactiveSequencer to cover callback fn * use retryTimer in throttlingLoop if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams) * op-batcher: improve active-seq-changed signalling setup * remove unused state var * rename blocksLoaded channel to publishSignal channel --------- Co-authored-by: Sebastian Stammler <seb@oplabs.co> * op-batcher: add `TestBatchSubmitter_sendTx_FloorDataGas` and patch `driver.sendTx` (ethereum-optimism#14517) * op-batcher: add TestBatchSubmitter_sendTx_FloorDataGas * op-batcher: use floorDataGas for transactions if greater than intrinsicGas * Update op-batcher/batcher/driver.go Co-authored-by: Sebastian Stammler <seb@oplabs.co> * log error instead of ignoring --------- Co-authored-by: Sebastian Stammler <seb@oplabs.co> * op-batcher: always `updateCursorAndMetrics` when returning from `processBlocks()` (ethereum-optimism#14520) * op-batcher: always updateCursorAndMetrics when returning from processBlocks() * Update op-batcher/batcher/channel_manager.go Co-authored-by: Sebastian Stammler <seb@oplabs.co> --------- Co-authored-by: Sebastian Stammler <seb@oplabs.co> * op-batcher: remove `ChannelManager.CheckExpectedProgress()` and add channel timeout log (ethereum-optimism#14553) * op-batcher: remove ChannelManager.CheckExpectedProgress * op-batcher: add warning log when a channel times out on chain * op-batcher: correctly track block metrics in `handleChannelInvalidated()` (ethereum-optimism#14561) * op-batcher: correctly track block metrics in handleChannelInvalidated Includes test which fails without the fix. * op-batcher: log out channels which are dropped during handleChannelInvalidated() * change to warn log for dropped channels * op-batcher: improve `computeSyncActions()` logging (ethereum-optimism#14563) * improve computeSyncActions logging * fixup test and make sure all cases run (!) * use more friendly format for structured logger * op-batcher: introduce `PREFER_LOCAL_SAFE_L2` config var (ethereum-optimism#14587) * op-batcher: introduce PREFER_LOCAL_SAFE_L2 config var * lint * Apply suggestions from code review Co-authored-by: Sebastian Stammler <seb@oplabs.co> * lint --------- Co-authored-by: Sebastian Stammler <seb@oplabs.co> * batcher: Wait for DA write before shutdown --------- Co-authored-by: olga yang <ya1994ng@gmail.com> Co-authored-by: protolambda <proto@protolambda.com> Co-authored-by: George Knee <georgeknee@googlemail.com> Co-authored-by: Sebastian Stammler <seb@oplabs.co>

geoknee added A-op-batcher Area: op-batcher C-spike Category: Performs a spike labels Feb 7, 2025

geoknee commented Feb 11, 2025

View reviewed changes

op-e2e/system/da/throughput_test.go Outdated Show resolved Hide resolved

geoknee added 21 commits February 12, 2025 09:57

improvements

9e15575

readloop unblocks writing loop once, and then writing loop goes forev…

d562dcc

…er (pausing when it runs out of data) tests pass but timeout due to bad shutdown

rename to prevent shadowing

7b8bd50

allow for receipt handling when receipt and err are both nil

24771bc

split shutdownCtx into producers and consumers

5806cb4

throttling loop ranges over a channel, does not take a context

42611b7

processReceiptsLoop ranges over a channel, does not take a context

c279c9b

unify wait groups

470f0b6

unify contexts

b3038b2

writeLoop uses a ticker instead of a sleep

fd617ff

WIP: add throughput test for batcher

0b0a026

make txQueue local, not global state

53e8e85

reduce diff

2b2d6de

reduce diff further

9f44d19

make pendingBytesUpdated a local, not global

46286ca

read and write loop communicate with a channel

932e4d7

no more ticker required

rename

cc4ef74

rename

37684fd

abstract promptLoop

ee492cf

rename

5bb0f65

geoknee force-pushed the gk/spike-throttle branch from 28de46b to 5bb0f65 Compare February 12, 2025 09:57

geoknee added 2 commits February 12, 2025 10:25

update readme

45d9987

add diagram to readme

55005d7

geoknee changed the title ~~op-batcher: SPIKE split block loading and batch publishing into separate goroutines~~ op-batcher: split block loading and batch publishing into separate goroutines Feb 12, 2025

remove test

be569b1

I'm not sure it is adding any value. May return to it in future

geoknee and others added 15 commits February 14, 2025 15:53

Apply suggestions from code review

03e616d

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

remove ThrottelInterval from flags

510dfe4

docs: mention active sequencer signal

ec48011

only attach callback if throttling is enabled

ab32665

increase buffer of activeSequencerChanged

31c9c76

means that we buffer a single event if the throttlingLoop is busy, even when using a try-send

add buffer to pendingBytesUpdated

7769b72

see previous commit

reduce indentation

0f27d9e

remove dangling ref

843377c

pass killCtx to publishingLoop and avoid accessing this context globally

2578565

remove continue and always signal publishing loop

69858c9

remove unecessary mutex wrangle

488352e

replace signalPublishingLoop method with trySignal fn

2c22d9f

extend TestRollupProvider_FailoverOnInactiveSequencer to cover callba…

70b5e9c

…ck fn

use retryTimer in throttlingLoop

cc26ac9

if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams)

op-batcher: improve active-seq-changed signalling setup

80f399d

sebastianst mentioned this pull request Feb 20, 2025

op-batcher: improve active-seq-changed signalling setup #14456

Merged

sebastianst reviewed Feb 20, 2025

View reviewed changes

op-batcher/batcher/driver.go Outdated Show resolved Hide resolved

op-batcher/batcher/service.go Outdated Show resolved Hide resolved

op-batcher/batcher/driver.go Show resolved Hide resolved

geoknee added 3 commits February 21, 2025 17:45

Merge pull request #14456 from ethereum-optimism/seb/gk-spike-throttle

7637f5b

op-batcher: improve active-seq-changed signalling setup

remove unused state var

d875e86

rename blocksLoaded channel to publishSignal channel

06a9228

sebastianst approved these changes Feb 21, 2025

View reviewed changes

geoknee added this pull request to the merge queue Feb 22, 2025

Merged via the queue into develop with commit 8992518 Feb 22, 2025
44 checks passed

geoknee deleted the gk/spike-throttle branch February 22, 2025 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

op-batcher: remove `ThrottleInterval` and split block loading and batch publishing into separate goroutines #14244

op-batcher: remove `ThrottleInterval` and split block loading and batch publishing into separate goroutines #14244

geoknee commented Feb 7, 2025 •

edited

Loading

codecov bot commented Feb 7, 2025 •

edited

Loading

op-batcher: remove ThrottleInterval and split block loading and batch publishing into separate goroutines #14244

op-batcher: remove ThrottleInterval and split block loading and batch publishing into separate goroutines #14244

Conversation

geoknee commented Feb 7, 2025 • edited Loading

codecov bot commented Feb 7, 2025 • edited Loading

Codecov Report

op-batcher: remove `ThrottleInterval` and split block loading and batch publishing into separate goroutines #14244

op-batcher: remove `ThrottleInterval` and split block loading and batch publishing into separate goroutines #14244

geoknee commented Feb 7, 2025 •

edited

Loading

codecov bot commented Feb 7, 2025 •

edited

Loading