Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-batcher: remove ThrottleInterval and split block loading and batch publishing into separate goroutines #14244

Merged
merged 59 commits into from
Feb 22, 2025

Conversation

geoknee
Copy link
Contributor

@geoknee geoknee commented Feb 7, 2025

This moves the hybrid "event + interval" model for DA throttling to a purely event driven model. There are two events which can trigger throttling: (1) a change in the amount of pending data building up in the batcher and (2) a change in the active sequencer which the batcher is pointing at.

A happy side effect is that the batcher's loading, publishing and throttling workloads are broken up and now work fully asynchronously. Previously the batcher would load, and then publish for up to some maximum amount of time (blocking block loading and throttling), then return to loading again in order to trigger throttling.

  • Closes op-batcher: Check whether the throttling interval can be removed #12838
  • Replaces some use of a for / select loop with ctx.Done() branch with a simpler range over a channel. This makes shutdown behaviour easier to reason about and makes deadlocks less likely.
  • Reduces the use of global variables, preferring local ones (wait groups, contexts, etc)
  • Readme has a diagram showing relation of the various goroutines and which ones signal to each other
  • Should help prevent throttling loop interfering with the other loading/publishing jobs of the batcher (i.e. should increase its throughput when under load).
  • Should reduce the chance that an active sequencer change results in a delay to the throttling signal
  • Be very interesting to know the impact of this change on batcher throughput. We need some acceptance testing / criteria around this.

TODO

  • to remove the throttlingLoop config variable
  • add an "event" for active sequencer change
  • stress test this batcher using kurtosis and tx fuzzing
  • prepare release notes / docs on how THROTTLE_INTERVAL is deprecated

Copy link

codecov bot commented Feb 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (7ba4555) to head (06a9228).
Report is 95 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff              @@
##           develop   #14244       +/-   ##
============================================
- Coverage    45.89%        0   -45.90%     
============================================
  Files         1008        0     -1008     
  Lines        86419        0    -86419     
============================================
- Hits         39663        0    -39663     
+ Misses       43758        0    -43758     
+ Partials      2998        0     -2998     
Flag Coverage Δ
cannon-go-tests-32 ?
cannon-go-tests-64 ?
contracts-bedrock-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1008 files with indirect coverage changes

@geoknee geoknee added A-op-batcher Area: op-batcher C-spike Category: Performs a spike labels Feb 7, 2025
@geoknee geoknee changed the title op-batcher: SPIKE split block loading and batch publishing into separate goroutines op-batcher: split block loading and batch publishing into separate goroutines Feb 12, 2025
I'm not sure it is adding any value. May return to it in future
@geoknee geoknee added this pull request to the merge queue Feb 22, 2025
Merged via the queue into develop with commit 8992518 Feb 22, 2025
44 checks passed
@geoknee geoknee deleted the gk/spike-throttle branch February 22, 2025 08:29
Rjected pushed a commit to paradigmxyz/optimism that referenced this pull request Feb 25, 2025
…ch publishing into separate goroutines (ethereum-optimism#14244)

* op-batcher: overhaul throttling, reading and writing loops

* no longer evaluate throttling conditions on a ticker
* break main loop into "reading" and "writing"
* reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded)
* these run concurrently

* improvements

* readloop unblocks writing loop once, and then writing loop goes forever (pausing when it runs out of data)

tests pass but timeout due to bad shutdown

* rename to prevent shadowing

* allow for receipt handling when receipt and err are both nil

* split shutdownCtx into producers and consumers

* throttling loop ranges over a channel, does not take a context

* processReceiptsLoop ranges over a channel, does not take a context

* unify wait groups

* unify contexts

* writeLoop uses a ticker instead of a sleep

* WIP: add throughput test for batcher

* make txQueue local, not global state

* reduce diff

* reduce diff further

* make pendingBytesUpdated a local, not global

* read and write loop communicate with a channel

no more ticker required

* rename

* rename

* abstract promptLoop

* rename

* update readme

* add diagram to readme

* remove test

I'm not sure it is adding any value. May return to it in future

* sendToThrottlingLoop uses mutex

* harmonize "*Loop returning" logs

* tidy

* remove TODO

* rip out ThrottleInterval config var

* add activeSequencerChanged channel and wire it up to onActiveSequencerChanged hook in active rollup provider

* set callback at runtime, not in constructor

* reinstate startup order

* tidy

* remove dead code

* remove unintentional change

* rename readLoop to blockLoadingLoop and writeLoop to publishingLoop

* improve diagram

* replace ThrottleInterval with ThrottleThreshold as enabling var

* remove onActiveProviderChanged arg from constructors

* protect callback invocation with nullity check

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* remove ThrottelInterval from flags

* docs: mention active sequencer signal

* only attach callback if throttling is enabled

* increase buffer of activeSequencerChanged

means that we buffer a single event if the throttlingLoop is busy, even when using a try-send

* add buffer to pendingBytesUpdated

see previous commit

* reduce indentation

* remove dangling ref

* pass killCtx to publishingLoop and avoid accessing this context globally

* remove continue and always signal publishing loop

* remove unecessary mutex wrangle

* replace signalPublishingLoop method with trySignal fn

* extend TestRollupProvider_FailoverOnInactiveSequencer to cover callback fn

* use retryTimer in throttlingLoop

if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams)

* op-batcher: improve active-seq-changed signalling setup

* remove unused state var

* rename blocksLoaded channel to publishSignal channel

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>
karlb pushed a commit to celo-org/optimism that referenced this pull request Mar 4, 2025
…ch publishing into separate goroutines (ethereum-optimism#14244)

* op-batcher: overhaul throttling, reading and writing loops

* no longer evaluate throttling conditions on a ticker
* break main loop into "reading" and "writing"
* reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded)
* these run concurrently

* improvements

* readloop unblocks writing loop once, and then writing loop goes forever (pausing when it runs out of data)

tests pass but timeout due to bad shutdown

* rename to prevent shadowing

* allow for receipt handling when receipt and err are both nil

* split shutdownCtx into producers and consumers

* throttling loop ranges over a channel, does not take a context

* processReceiptsLoop ranges over a channel, does not take a context

* unify wait groups

* unify contexts

* writeLoop uses a ticker instead of a sleep

* WIP: add throughput test for batcher

* make txQueue local, not global state

* reduce diff

* reduce diff further

* make pendingBytesUpdated a local, not global

* read and write loop communicate with a channel

no more ticker required

* rename

* rename

* abstract promptLoop

* rename

* update readme

* add diagram to readme

* remove test

I'm not sure it is adding any value. May return to it in future

* sendToThrottlingLoop uses mutex

* harmonize "*Loop returning" logs

* tidy

* remove TODO

* rip out ThrottleInterval config var

* add activeSequencerChanged channel and wire it up to onActiveSequencerChanged hook in active rollup provider

* set callback at runtime, not in constructor

* reinstate startup order

* tidy

* remove dead code

* remove unintentional change

* rename readLoop to blockLoadingLoop and writeLoop to publishingLoop

* improve diagram

* replace ThrottleInterval with ThrottleThreshold as enabling var

* remove onActiveProviderChanged arg from constructors

* protect callback invocation with nullity check

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* remove ThrottelInterval from flags

* docs: mention active sequencer signal

* only attach callback if throttling is enabled

* increase buffer of activeSequencerChanged

means that we buffer a single event if the throttlingLoop is busy, even when using a try-send

* add buffer to pendingBytesUpdated

see previous commit

* reduce indentation

* remove dangling ref

* pass killCtx to publishingLoop and avoid accessing this context globally

* remove continue and always signal publishing loop

* remove unecessary mutex wrangle

* replace signalPublishingLoop method with trySignal fn

* extend TestRollupProvider_FailoverOnInactiveSequencer to cover callback fn

* use retryTimer in throttlingLoop

if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams)

* op-batcher: improve active-seq-changed signalling setup

* remove unused state var

* rename blocksLoaded channel to publishSignal channel

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>
karlb pushed a commit to celo-org/optimism that referenced this pull request Mar 5, 2025
…ch publishing into separate goroutines (ethereum-optimism#14244)

* op-batcher: overhaul throttling, reading and writing loops

* no longer evaluate throttling conditions on a ticker
* break main loop into "reading" and "writing"
* reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded)
* these run concurrently

* improvements

* readloop unblocks writing loop once, and then writing loop goes forever (pausing when it runs out of data)

tests pass but timeout due to bad shutdown

* rename to prevent shadowing

* allow for receipt handling when receipt and err are both nil

* split shutdownCtx into producers and consumers

* throttling loop ranges over a channel, does not take a context

* processReceiptsLoop ranges over a channel, does not take a context

* unify wait groups

* unify contexts

* writeLoop uses a ticker instead of a sleep

* WIP: add throughput test for batcher

* make txQueue local, not global state

* reduce diff

* reduce diff further

* make pendingBytesUpdated a local, not global

* read and write loop communicate with a channel

no more ticker required

* rename

* rename

* abstract promptLoop

* rename

* update readme

* add diagram to readme

* remove test

I'm not sure it is adding any value. May return to it in future

* sendToThrottlingLoop uses mutex

* harmonize "*Loop returning" logs

* tidy

* remove TODO

* rip out ThrottleInterval config var

* add activeSequencerChanged channel and wire it up to onActiveSequencerChanged hook in active rollup provider

* set callback at runtime, not in constructor

* reinstate startup order

* tidy

* remove dead code

* remove unintentional change

* rename readLoop to blockLoadingLoop and writeLoop to publishingLoop

* improve diagram

* replace ThrottleInterval with ThrottleThreshold as enabling var

* remove onActiveProviderChanged arg from constructors

* protect callback invocation with nullity check

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* remove ThrottelInterval from flags

* docs: mention active sequencer signal

* only attach callback if throttling is enabled

* increase buffer of activeSequencerChanged

means that we buffer a single event if the throttlingLoop is busy, even when using a try-send

* add buffer to pendingBytesUpdated

see previous commit

* reduce indentation

* remove dangling ref

* pass killCtx to publishingLoop and avoid accessing this context globally

* remove continue and always signal publishing loop

* remove unecessary mutex wrangle

* replace signalPublishingLoop method with trySignal fn

* extend TestRollupProvider_FailoverOnInactiveSequencer to cover callback fn

* use retryTimer in throttlingLoop

if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams)

* op-batcher: improve active-seq-changed signalling setup

* remove unused state var

* rename blocksLoaded channel to publishSignal channel

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>
palango pushed a commit to celo-org/optimism that referenced this pull request Mar 6, 2025
* op-batcher: prevent `SpanChannelOut` RLP bytes overflowing `MaxRLPBytesPerChannel` (ethereum-optimism#14310)

* fix op-batcher pack over MaxRLPBytesChannel

* add test cases from different CompressionAlgo

* add fresh compression logic and corresponding comments

* refactor: enhance MaxRLPBytesPerChannel test

* refactor: rename variable and add required messages

* op-batcher: use local-safe to reduce lag during interop sync (ethereum-optimism#14265)

* op-batcher: remove `ThrottleInterval` and split block loading and batch publishing into separate goroutines (ethereum-optimism#14244)

* op-batcher: overhaul throttling, reading and writing loops

* no longer evaluate throttling conditions on a ticker
* break main loop into "reading" and "writing"
* reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded)
* these run concurrently

* improvements

* readloop unblocks writing loop once, and then writing loop goes forever (pausing when it runs out of data)

tests pass but timeout due to bad shutdown

* rename to prevent shadowing

* allow for receipt handling when receipt and err are both nil

* split shutdownCtx into producers and consumers

* throttling loop ranges over a channel, does not take a context

* processReceiptsLoop ranges over a channel, does not take a context

* unify wait groups

* unify contexts

* writeLoop uses a ticker instead of a sleep

* WIP: add throughput test for batcher

* make txQueue local, not global state

* reduce diff

* reduce diff further

* make pendingBytesUpdated a local, not global

* read and write loop communicate with a channel

no more ticker required

* rename

* rename

* abstract promptLoop

* rename

* update readme

* add diagram to readme

* remove test

I'm not sure it is adding any value. May return to it in future

* sendToThrottlingLoop uses mutex

* harmonize "*Loop returning" logs

* tidy

* remove TODO

* rip out ThrottleInterval config var

* add activeSequencerChanged channel and wire it up to onActiveSequencerChanged hook in active rollup provider

* set callback at runtime, not in constructor

* reinstate startup order

* tidy

* remove dead code

* remove unintentional change

* rename readLoop to blockLoadingLoop and writeLoop to publishingLoop

* improve diagram

* replace ThrottleInterval with ThrottleThreshold as enabling var

* remove onActiveProviderChanged arg from constructors

* protect callback invocation with nullity check

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* remove ThrottelInterval from flags

* docs: mention active sequencer signal

* only attach callback if throttling is enabled

* increase buffer of activeSequencerChanged

means that we buffer a single event if the throttlingLoop is busy, even when using a try-send

* add buffer to pendingBytesUpdated

see previous commit

* reduce indentation

* remove dangling ref

* pass killCtx to publishingLoop and avoid accessing this context globally

* remove continue and always signal publishing loop

* remove unecessary mutex wrangle

* replace signalPublishingLoop method with trySignal fn

* extend TestRollupProvider_FailoverOnInactiveSequencer to cover callback fn

* use retryTimer in throttlingLoop

if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams)

* op-batcher: improve active-seq-changed signalling setup

* remove unused state var

* rename blocksLoaded channel to publishSignal channel

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* op-batcher: add `TestBatchSubmitter_sendTx_FloorDataGas` and patch `driver.sendTx` (ethereum-optimism#14517)

* op-batcher: add TestBatchSubmitter_sendTx_FloorDataGas

* op-batcher: use floorDataGas for transactions if greater than intrinsicGas

* Update op-batcher/batcher/driver.go

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* log error instead of ignoring

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* op-batcher: always `updateCursorAndMetrics` when returning from `processBlocks()` (ethereum-optimism#14520)

* op-batcher: always updateCursorAndMetrics when returning from processBlocks()

* Update op-batcher/batcher/channel_manager.go

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* op-batcher: remove `ChannelManager.CheckExpectedProgress()` and add channel timeout log (ethereum-optimism#14553)

* op-batcher: remove ChannelManager.CheckExpectedProgress

* op-batcher: add warning log when a channel times out on chain

* op-batcher: correctly track block metrics in `handleChannelInvalidated()` (ethereum-optimism#14561)

* op-batcher: correctly track block metrics in handleChannelInvalidated

Includes test which fails without the fix.

* op-batcher: log out channels which are dropped during handleChannelInvalidated()

* change to warn log for dropped channels

* op-batcher: improve `computeSyncActions()` logging (ethereum-optimism#14563)

* improve computeSyncActions logging

* fixup test and make sure all cases run (!)

* use more friendly format for structured logger

* op-batcher: introduce `PREFER_LOCAL_SAFE_L2` config var (ethereum-optimism#14587)

* op-batcher: introduce PREFER_LOCAL_SAFE_L2 config var

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* lint

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* batcher: Wait for DA write before shutdown

---------

Co-authored-by: olga yang <ya1994ng@gmail.com>
Co-authored-by: protolambda <proto@protolambda.com>
Co-authored-by: George Knee <georgeknee@googlemail.com>
Co-authored-by: Sebastian Stammler <seb@oplabs.co>
QuentinI pushed a commit to EspressoSystems/optimism-espresso-integration that referenced this pull request Mar 7, 2025
* op-batcher: prevent `SpanChannelOut` RLP bytes overflowing `MaxRLPBytesPerChannel` (ethereum-optimism#14310)

* fix op-batcher pack over MaxRLPBytesChannel

* add test cases from different CompressionAlgo

* add fresh compression logic and corresponding comments

* refactor: enhance MaxRLPBytesPerChannel test

* refactor: rename variable and add required messages

* op-batcher: use local-safe to reduce lag during interop sync (ethereum-optimism#14265)

* op-batcher: remove `ThrottleInterval` and split block loading and batch publishing into separate goroutines (ethereum-optimism#14244)

* op-batcher: overhaul throttling, reading and writing loops

* no longer evaluate throttling conditions on a ticker
* break main loop into "reading" and "writing"
* reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded)
* these run concurrently

* improvements

* readloop unblocks writing loop once, and then writing loop goes forever (pausing when it runs out of data)

tests pass but timeout due to bad shutdown

* rename to prevent shadowing

* allow for receipt handling when receipt and err are both nil

* split shutdownCtx into producers and consumers

* throttling loop ranges over a channel, does not take a context

* processReceiptsLoop ranges over a channel, does not take a context

* unify wait groups

* unify contexts

* writeLoop uses a ticker instead of a sleep

* WIP: add throughput test for batcher

* make txQueue local, not global state

* reduce diff

* reduce diff further

* make pendingBytesUpdated a local, not global

* read and write loop communicate with a channel

no more ticker required

* rename

* rename

* abstract promptLoop

* rename

* update readme

* add diagram to readme

* remove test

I'm not sure it is adding any value. May return to it in future

* sendToThrottlingLoop uses mutex

* harmonize "*Loop returning" logs

* tidy

* remove TODO

* rip out ThrottleInterval config var

* add activeSequencerChanged channel and wire it up to onActiveSequencerChanged hook in active rollup provider

* set callback at runtime, not in constructor

* reinstate startup order

* tidy

* remove dead code

* remove unintentional change

* rename readLoop to blockLoadingLoop and writeLoop to publishingLoop

* improve diagram

* replace ThrottleInterval with ThrottleThreshold as enabling var

* remove onActiveProviderChanged arg from constructors

* protect callback invocation with nullity check

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* remove ThrottelInterval from flags

* docs: mention active sequencer signal

* only attach callback if throttling is enabled

* increase buffer of activeSequencerChanged

means that we buffer a single event if the throttlingLoop is busy, even when using a try-send

* add buffer to pendingBytesUpdated

see previous commit

* reduce indentation

* remove dangling ref

* pass killCtx to publishingLoop and avoid accessing this context globally

* remove continue and always signal publishing loop

* remove unecessary mutex wrangle

* replace signalPublishingLoop method with trySignal fn

* extend TestRollupProvider_FailoverOnInactiveSequencer to cover callback fn

* use retryTimer in throttlingLoop

if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams)

* op-batcher: improve active-seq-changed signalling setup

* remove unused state var

* rename blocksLoaded channel to publishSignal channel

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* op-batcher: add `TestBatchSubmitter_sendTx_FloorDataGas` and patch `driver.sendTx` (ethereum-optimism#14517)

* op-batcher: add TestBatchSubmitter_sendTx_FloorDataGas

* op-batcher: use floorDataGas for transactions if greater than intrinsicGas

* Update op-batcher/batcher/driver.go

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* log error instead of ignoring

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* op-batcher: always `updateCursorAndMetrics` when returning from `processBlocks()` (ethereum-optimism#14520)

* op-batcher: always updateCursorAndMetrics when returning from processBlocks()

* Update op-batcher/batcher/channel_manager.go

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* op-batcher: remove `ChannelManager.CheckExpectedProgress()` and add channel timeout log (ethereum-optimism#14553)

* op-batcher: remove ChannelManager.CheckExpectedProgress

* op-batcher: add warning log when a channel times out on chain

* op-batcher: correctly track block metrics in `handleChannelInvalidated()` (ethereum-optimism#14561)

* op-batcher: correctly track block metrics in handleChannelInvalidated

Includes test which fails without the fix.

* op-batcher: log out channels which are dropped during handleChannelInvalidated()

* change to warn log for dropped channels

* op-batcher: improve `computeSyncActions()` logging (ethereum-optimism#14563)

* improve computeSyncActions logging

* fixup test and make sure all cases run (!)

* use more friendly format for structured logger

* op-batcher: introduce `PREFER_LOCAL_SAFE_L2` config var (ethereum-optimism#14587)

* op-batcher: introduce PREFER_LOCAL_SAFE_L2 config var

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* lint

---------

Co-authored-by: Sebastian Stammler <seb@oplabs.co>

* batcher: Wait for DA write before shutdown

---------

Co-authored-by: olga yang <ya1994ng@gmail.com>
Co-authored-by: protolambda <proto@protolambda.com>
Co-authored-by: George Knee <georgeknee@googlemail.com>
Co-authored-by: Sebastian Stammler <seb@oplabs.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-op-batcher Area: op-batcher C-spike Category: Performs a spike
Projects
None yet
Development

Successfully merging this pull request may close these issues.

op-batcher: Check whether the throttling interval can be removed
2 participants