Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design doc: invalid block progression #20

Merged
merged 1 commit into from
Jun 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions protocol/invalid-block-progression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Purpose

The purpose of this document is to progress a design originally proposed by @protolambda and contributed to
by Hamdi [here](https://github.com/ethereum-optimism/specs/pull/88). The goal is to reduce resource usage
by the derivation pipeline when invalid blocks are proposed. This is helpful for ensuring that the fault
proof program cannot be denial of service attacked as well as reducing the latency of reorgs, which is
especially important in the context of interop.

# Summary

New derivation pipeline invariants are introduced that are meant to reduce large resets of the derivation pipeline
when handling invalid batches.
Comment on lines +11 to +12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to more explicitly explain (probably below, not here) why the new rules avoid larger resets. This could include explaining what the current behavior is under the preconditions of the new rules.


# Problem Statement + Context

Comment on lines +14 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fault-proofs are the main motivation here that I think is missing from the design doc. With the current approach of re-attempting with the next block-input candidate for the same block-height, it becomes quite complicated and expensive to prepare a chain optimistically, as the same block-height can be revisited many times. For interop fault-proofs to combine multiple optimistically processed chains, a lot of dependencies are assumed, and all computational work is wasted if one of the pieces has to be revisited.

The derivation pipeline is set up as a series of stages. It takes work to progress from one stage to the next.
Sometimes this work involves many remote HTTP requests, other times it requires in memory manipulation of
data structures. Its not ideal when there is an error in a later stage that causes a lot of additional work
to be done in an earlier stage to recover. In practice, this looks like some sort of invalid batch was deserialized
and errors during its processing.

> The `DerivationPipeline` has a `Step()` function that is called over and over again to derive the chain,
which ultimately looks like mapping raw batch data into `ExecutionPayload`s. `DerivationPipeline.Step()`
wraps a call to `EngineQueue.Step()` which wraps a call to `AttributesQueue.NextAttributes()`.

Examples of types of errors:
- [Invalid transaction in block](https://github.com/ethereum-optimism/optimism/blob/8b5e3c264d2a3a114e7df5361acd8d4c1c700f81/op-node/rollup/derive/batches.go#L156)
- [Invalid Execution Payload](https://github.com/ethereum-optimism/optimism/blob/8b5e3c264d2a3a114e7df5361acd8d4c1c700f81/op-node/rollup/derive/engine_update.go#L94)
- [Invalid batch timestamp](https://github.com/ethereum-optimism/optimism/blob/8b5e3c264d2a3a114e7df5361acd8d4c1c700f81/op-node/rollup/derive/attributes_queue.go#L84)
- [Batch contains deposit tx](https://github.com/ethereum-optimism/optimism/blob/8b5e3c264d2a3a114e7df5361acd8d4c1c700f81/op-node/rollup/derive/batches.go#L160)


# Proposed Solution

The following 2 invariants are introduced to ensure timely progression of the derivation pipeline. In both cases, either a batch
is correctly progressed or it is mapped into a deposits only batch.

If the batch data [makes it](https://github.com/ethereum-optimism/optimism/blob/8b5e3c264d2a3a114e7df5361acd8d4c1c700f81/op-node/rollup/derive/batch_queue.go#L223)
to the `BatchQueue`, then the batch is either properly deserialized and validated or is mapped into a "deposits only" batch.
This specifically describes the `BatchDrop` cases returned from [CheckBatch](https://github.com/ethereum-optimism/optimism/blob/8b5e3c264d2a3a114e7df5361acd8d4c1c700f81/op-node/rollup/derive/batches.go#L34).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs defined semantics for span batches

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could just say that we apply the rule for every singular batch reconstructed from the span batch? This should already give us the behavior that we need for interop, IIUC.

An alternative would be to drop the whole span batch, but I don't see an advantage from this over going over each singular batch case by case.


If the Execution API on the EL returns [INVALID](https://github.com/ethereum/go-ethereum/blob/7fd7c1f7dd9ba8d90399df2f080e4101ae37a255/beacon/engine/errors.go#L63), then the batch that produced the `ExecutionPayload` is mapped into a deposits only batch before
being attempted again as an `ExecutionPayload`. This particular check captures things like invalid nonces on transactions included
in a block.

With the launch of interop, an additional invariant will be added which involves mapping any batches that contain invalid executing
messages into a deposits only batch.

# Alternatives Considered

## Smaller Solution

It is possible to only enforce one of the two invariants, where we would want to drop the EL `INVALID` invariant in favor of
the batch queue invariant.

# Risks & Uncertainties

- This adds risk around unsafe head reorgs. To date, there has not been an unsafe head reorg. Cases where batches would be
dropped and then there is another chance to progress the safe head via submitting another batch would be turned into
unsafe reorgs. The software and ops have been hardened after running it in production for some time.
Comment on lines +56 to +60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth clarifying that going from one failed batch to another different batch at the same L2 block height can be considered a form of "equivocation" (especially since it can be invalidated by a change of message validity on another chain), something that negatively impacts a lot of possible protocol designs, and is hence worth removing, despite the drawback in unsafe-chain recovery here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Also, the actual current Go op-batcher implementation wouldn't resend a batch for the same block height if that batch failed to import during derivation. Only a batcher restart would cause this. I'm also not aware of a single case since the launch of Bedrock where an invalid batch was produced that then got replaces by a valid one for the same L2 block height.