-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design doc: invalid block progression #20
Conversation
A design doc describing the two invariants that can ensure that the derivation pipeline can always progress without needing to stall and do a lot of extra work when invalid batches are submitted. Ideally this can be included in a hardfork before interop ships to make the derivation pipeline much more hardened for interop.
|
||
If the batch data [makes it](https://github.com/ethereum-optimism/optimism/blob/8b5e3c264d2a3a114e7df5361acd8d4c1c700f81/op-node/rollup/derive/batch_queue.go#L223) | ||
to the `BatchQueue`, then the batch is either properly deserialized and validated or is mapped into a "deposits only" batch. | ||
This specifically describes the `BatchDrop` cases returned from [CheckBatch](https://github.com/ethereum-optimism/optimism/blob/8b5e3c264d2a3a114e7df5361acd8d4c1c700f81/op-node/rollup/derive/batches.go#L34). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs defined semantics for span batches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could just say that we apply the rule for every singular batch reconstructed from the span batch? This should already give us the behavior that we need for interop, IIUC.
An alternative would be to drop the whole span batch, but I don't see an advantage from this over going over each singular batch case by case.
# Problem Statement + Context | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fault-proofs are the main motivation here that I think is missing from the design doc. With the current approach of re-attempting with the next block-input candidate for the same block-height, it becomes quite complicated and expensive to prepare a chain optimistically, as the same block-height can be revisited many times. For interop fault-proofs to combine multiple optimistically processed chains, a lot of dependencies are assumed, and all computational work is wasted if one of the pieces has to be revisited.
# Risks & Uncertainties | ||
|
||
- This adds risk around unsafe head reorgs. To date, there has not been an unsafe head reorg. Cases where batches would be | ||
dropped and then there is another chance to progress the safe head via submitting another batch would be turned into | ||
unsafe reorgs. The software and ops have been hardened after running it in production for some time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be worth clarifying that going from one failed batch to another different batch at the same L2 block height can be considered a form of "equivocation" (especially since it can be invalidated by a change of message validity on another chain), something that negatively impacts a lot of possible protocol designs, and is hence worth removing, despite the drawback in unsafe-chain recovery here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Also, the actual current Go op-batcher implementation wouldn't resend a batch for the same block height if that batch failed to import during derivation. Only a batcher restart would cause this. I'm also not aware of a single case since the launch of Bedrock where an invalid batch was produced that then got replaces by a valid one for the same L2 block height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal! I'm supportive and the way I think about this is that the current rules allow to resubmit a faulty batch which feels like a defensive mechanism to recover from a faulty sequencer (implementation). However, we've now established enough trust into our implementation that we can take this approach instead. After all, faulty batches shouldn't happen with a correct implementation.
- are there any other places in the DP where we also want to change the protocol to just derive a deposit-only block instead of causing a pipeline reset?
# Risks & Uncertainties | ||
|
||
- This adds risk around unsafe head reorgs. To date, there has not been an unsafe head reorg. Cases where batches would be | ||
dropped and then there is another chance to progress the safe head via submitting another batch would be turned into | ||
unsafe reorgs. The software and ops have been hardened after running it in production for some time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Also, the actual current Go op-batcher implementation wouldn't resend a batch for the same block height if that batch failed to import during derivation. Only a batcher restart would cause this. I'm also not aware of a single case since the launch of Bedrock where an invalid batch was produced that then got replaces by a valid one for the same L2 block height.
New derivation pipeline invariants are introduced that are meant to reduce large resets of the derivation pipeline | ||
when handling invalid batches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to more explicitly explain (probably below, not here) why the new rules avoid larger resets. This could include explaining what the current behavior is under the preconditions of the new rules.
This is already an in-flight project, so I'll be merging this in. We can update the design doc in subsequent PRs. |
Description
A design doc describing the two invariants that can ensure
that the derivation pipeline can always progress without needing
to stall and do a lot of extra work when invalid batches are submitted.
Ideally this can be included in a hardfork before interop ships
to make the derivation pipeline much more hardened for interop.