Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Withdrawals root on "inconsistent" attestation verification states #4234

Closed
paulhauner opened this issue Apr 26, 2023 · 4 comments · May be fixed by #4235
Closed

Withdrawals root on "inconsistent" attestation verification states #4234

paulhauner opened this issue Apr 26, 2023 · 4 comments · May be fixed by #4235
Assignees
Labels
v4.2.0 Q2 2023

Comments

@paulhauner
Copy link
Member

Description

On April 24 we saw this log across our mainnet SigP fleet:

Apr 24 02:14:28.797 ERRO Unable to validate attestation error: DBError(BlockReplayError(BlockProcessing(WithdrawalsRootMismatch { expected: 0xe7ad583501ba8fbbb1fd92fb416cbec416c7b9174f5ca5369d01fa4aa62d3ff2, found: 0xfa3b64cef0f2948013d73b7267b228b62f2ce2901ade70ecaeefbae03a8c39da }))), peer_id: 16Uiu2HAm88gHB8ZjQACD6U3ey9c12ZmBy746pWCWeoosFHPqLNSW, type: "unaggregated", slot: Slot(6289867), beacon_block_root: 0xae32ea8

This is an attestation from the network failing verification due to a BlockProcessingError::WithdrawalsRootMismatch error. That error indicates that we were unable to recreate the BeaconState we need to process that attestation; it is an internal error in the category of "should never happen".

An initial diagnosis from @michaelsproul was:

I think we're probably corrupting the block_roots array when we do the inconsistent state root replay:

.get_inconsistent_state_for_attestation_verification_only(
&state_root,
Some(head_block.slot),
)?

// If we don't care about state roots then return immediately.
if self.state_root_strategy == StateRootStrategy::Inconsistent {
return Ok(Some(Hash256::zero()));
}

let summary = per_slot_processing(&mut self.state, state_root, self.spec)
.map_err(BlockReplayError::from)?;

Once block_roots is corrupted it will cause attestation replay to produce different rewards+penalties, changing balances, and therefore withdrawal amounts. I think this has been the case for a long time, but wasn't checked by the pre-Capella block processing code

I agree with his diagnosis but I don't believe anyone has proved it yet (e.g. with a unit test).

In detail, I believe the sequence of events is:

  1. We receive an attestation from the network where we do not have its shuffling cached (in the case of the above error, the attestation pointed to a block that was 45 slots behind; a scenario that is unexpected and probably outside our caches).
  2. We end up determining that we need to load the state from disk to know the shuffling.
  3. We call get_inconsistent_state_for_attestation_verification_only, which loads a state from the DB and does an "inconsistent" version of state processing.
  4. The inconsistent lookup works by loading an epoch-boundary state from the hot DB and the replaying blocks (and skip slots) to generate the state at the requested slot. The state is "inconsistent" because we avoid the significant overhead of computing state roots during the process.
  5. A side-effect of using invalid state roots is that the block roots get corrupted too (because the block root contains the state root).
  6. Our invalid history of block roots means that we can't assign attestation rewards correctly (practically every attestation will be a miss because they're voting on block roots we don't know).
  7. These rewards generally don't matter, since we only update balances for attestation participation at epoch boundaries (and we never cross an epoch boundary in an inconsistent state replay). However, we do immediately update the balance for the proposer reward portion of an attestation.
  8. Therefore, if we have the scenario where a validator receives some proposer reward in an epoch and is scheduled for a withdrawal, then we end up giving them the wrong balance for a withdrawal (because we didn't assign them the proposer reward they actually earned).
  9. The mismatch in withdrawal amounts causes a withdrawal root mismatch.

Potential Solutions

I see two potential solutions:

  1. Simply skip the withdrawals root check (or all the withdrawals code). Since attester shuffling is created with a 1-2 epoch look ahead and we're skipping less than an epoch, we don't care about the effect that the withdrawals have on the shuffling (if they actually had any affect at all, I don't think they do).
  2. When loading the inconsistent state to get the shuffling, just load the state at the epoch boundary and use it to get the shuffling. I don't know why we'd need a mid-epoch state just to get attester shuffling.

(1) is the simplest solution, I think it's fairly easy to reason that it's correct. (2) is more complicated but perhaps a better solution because it (a) avoids doing unnecessary work and (b) potentially avoids other issues like this one in the future.

@paulhauner
Copy link
Member Author

paulhauner commented Apr 26, 2023

I'm having a go at option (2) over in #4235.

@paulhauner
Copy link
Member Author

It turns out that (2) doesn't work since we can't just load the target state when the shuffling epoch is beyond the shuffling lookahead of the epoch of beacon_block_root. That's because all blocks in the current epoch of the state are required to compute the randao seed (and the target state doesn't include those blocks).

Looks like we'll still need to do (1) regardless of whether not not we merge #4235.

@jimmygchen
Copy link
Member

Hi @paulhauner , I'd like to look into this one.

@jimmygchen jimmygchen self-assigned this Apr 27, 2023
bors bot pushed a commit that referenced this issue May 9, 2023
## Issue Addressed

Addresses #4234 

## Proposed Changes

- Skip withdrawals processing in an inconsistent state replay. 
- Repurpose `StateRootStrategy`: rename to `StateProcessingStrategy` and always skip withdrawals if using `StateProcessingStrategy::Inconsistent`
- Add a test to reproduce the scenario


Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
@paulhauner
Copy link
Member Author

Resolved via #4249 🎉

ghost pushed a commit to oone-world/lighthouse that referenced this issue Jul 13, 2023
## Issue Addressed

Addresses sigp#4234 

## Proposed Changes

- Skip withdrawals processing in an inconsistent state replay. 
- Repurpose `StateRootStrategy`: rename to `StateProcessingStrategy` and always skip withdrawals if using `StateProcessingStrategy::Inconsistent`
- Add a test to reproduce the scenario


Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Addresses sigp#4234

- Skip withdrawals processing in an inconsistent state replay.
- Repurpose `StateRootStrategy`: rename to `StateProcessingStrategy` and always skip withdrawals if using `StateProcessingStrategy::Inconsistent`
- Add a test to reproduce the scenario

Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Addresses sigp#4234

- Skip withdrawals processing in an inconsistent state replay.
- Repurpose `StateRootStrategy`: rename to `StateProcessingStrategy` and always skip withdrawals if using `StateProcessingStrategy::Inconsistent`
- Add a test to reproduce the scenario

Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v4.2.0 Q2 2023
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants