Withdrawals root on "inconsistent" attestation verification states #4234

paulhauner · 2023-04-26T03:39:37Z

Description

On April 24 we saw this log across our mainnet SigP fleet:

Apr 24 02:14:28.797 ERRO Unable to validate attestation error: DBError(BlockReplayError(BlockProcessing(WithdrawalsRootMismatch { expected: 0xe7ad583501ba8fbbb1fd92fb416cbec416c7b9174f5ca5369d01fa4aa62d3ff2, found: 0xfa3b64cef0f2948013d73b7267b228b62f2ce2901ade70ecaeefbae03a8c39da }))), peer_id: 16Uiu2HAm88gHB8ZjQACD6U3ey9c12ZmBy746pWCWeoosFHPqLNSW, type: "unaggregated", slot: Slot(6289867), beacon_block_root: 0xae32ea8

This is an attestation from the network failing verification due to a BlockProcessingError::WithdrawalsRootMismatch error. That error indicates that we were unable to recreate the BeaconState we need to process that attestation; it is an internal error in the category of "should never happen".

An initial diagnosis from @michaelsproul was:

I think we're probably corrupting the block_roots array when we do the inconsistent state root replay:

lighthouse/beacon_node/beacon_chain/src/beacon_chain.rs

Lines 5549 to 5552 in 693886b

.get_inconsistent_state_for_attestation_verification_only(

&state_root,

Some(head_block.slot),

)?

lighthouse/consensus/state_processing/src/block_replayer.rs

Lines 184 to 187 in 693886b

// If we don't care about state roots then return immediately.

if self.state_root_strategy == StateRootStrategy::Inconsistent {

return Ok(Some(Hash256::zero()));

}

lighthouse/consensus/state_processing/src/block_replayer.rs

Lines 235 to 236 in 693886b

let summary = per_slot_processing(&mut self.state, state_root, self.spec)

.map_err(BlockReplayError::from)?;

Once block_roots is corrupted it will cause attestation replay to produce different rewards+penalties, changing balances, and therefore withdrawal amounts. I think this has been the case for a long time, but wasn't checked by the pre-Capella block processing code

I agree with his diagnosis but I don't believe anyone has proved it yet (e.g. with a unit test).

In detail, I believe the sequence of events is:

We receive an attestation from the network where we do not have its shuffling cached (in the case of the above error, the attestation pointed to a block that was 45 slots behind; a scenario that is unexpected and probably outside our caches).
We end up determining that we need to load the state from disk to know the shuffling.
We call get_inconsistent_state_for_attestation_verification_only, which loads a state from the DB and does an "inconsistent" version of state processing.
The inconsistent lookup works by loading an epoch-boundary state from the hot DB and the replaying blocks (and skip slots) to generate the state at the requested slot. The state is "inconsistent" because we avoid the significant overhead of computing state roots during the process.
A side-effect of using invalid state roots is that the block roots get corrupted too (because the block root contains the state root).
Our invalid history of block roots means that we can't assign attestation rewards correctly (practically every attestation will be a miss because they're voting on block roots we don't know).
These rewards generally don't matter, since we only update balances for attestation participation at epoch boundaries (and we never cross an epoch boundary in an inconsistent state replay). However, we do immediately update the balance for the proposer reward portion of an attestation.
Therefore, if we have the scenario where a validator receives some proposer reward in an epoch and is scheduled for a withdrawal, then we end up giving them the wrong balance for a withdrawal (because we didn't assign them the proposer reward they actually earned).
The mismatch in withdrawal amounts causes a withdrawal root mismatch.

Potential Solutions

I see two potential solutions:

Simply skip the withdrawals root check (or all the withdrawals code). Since attester shuffling is created with a 1-2 epoch look ahead and we're skipping less than an epoch, we don't care about the effect that the withdrawals have on the shuffling (if they actually had any affect at all, I don't think they do).
When loading the inconsistent state to get the shuffling, just load the state at the epoch boundary and use it to get the shuffling. I don't know why we'd need a mid-epoch state just to get attester shuffling.

(1) is the simplest solution, I think it's fairly easy to reason that it's correct. (2) is more complicated but perhaps a better solution because it (a) avoids doing unnecessary work and (b) potentially avoids other issues like this one in the future.

The text was updated successfully, but these errors were encountered:

paulhauner · 2023-04-26T04:18:44Z

I'm having a go at option (2) over in #4235.

paulhauner · 2023-04-26T04:47:48Z

It turns out that (2) doesn't work since we can't just load the target state when the shuffling epoch is beyond the shuffling lookahead of the epoch of beacon_block_root. That's because all blocks in the current epoch of the state are required to compute the randao seed (and the target state doesn't include those blocks).

Looks like we'll still need to do (1) regardless of whether not not we merge #4235.

jimmygchen · 2023-04-27T02:04:33Z

Hi @paulhauner , I'd like to look into this one.

## Issue Addressed Addresses #4234 ## Proposed Changes - Skip withdrawals processing in an inconsistent state replay. - Repurpose `StateRootStrategy`: rename to `StateProcessingStrategy` and always skip withdrawals if using `StateProcessingStrategy::Inconsistent` - Add a test to reproduce the scenario Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>

paulhauner · 2023-05-15T01:30:08Z

Resolved via #4249 🎉

## Issue Addressed Addresses sigp#4234 ## Proposed Changes - Skip withdrawals processing in an inconsistent state replay. - Repurpose `StateRootStrategy`: rename to `StateProcessingStrategy` and always skip withdrawals if using `StateProcessingStrategy::Inconsistent` - Add a test to reproduce the scenario Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>

Addresses sigp#4234 - Skip withdrawals processing in an inconsistent state replay. - Repurpose `StateRootStrategy`: rename to `StateProcessingStrategy` and always skip withdrawals if using `StateProcessingStrategy::Inconsistent` - Add a test to reproduce the scenario Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>

paulhauner added the v4.2.0 Q2 2023 label Apr 26, 2023

paulhauner mentioned this issue Apr 26, 2023

Use target state to compute committees #4235

Open

jimmygchen self-assigned this Apr 27, 2023

jimmygchen mentioned this issue May 1, 2023

[Merged by Bors] - Fix attestation withdrawals root mismatch #4249

Closed

paulhauner closed this as completed May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Withdrawals root on "inconsistent" attestation verification states #4234

Withdrawals root on "inconsistent" attestation verification states #4234

paulhauner commented Apr 26, 2023

paulhauner commented Apr 26, 2023 •

edited

Loading

paulhauner commented Apr 26, 2023

jimmygchen commented Apr 27, 2023

paulhauner commented May 15, 2023

Withdrawals root on "inconsistent" attestation verification states #4234

Withdrawals root on "inconsistent" attestation verification states #4234

Comments

paulhauner commented Apr 26, 2023

Description

Potential Solutions

paulhauner commented Apr 26, 2023 • edited Loading

paulhauner commented Apr 26, 2023

jimmygchen commented Apr 27, 2023

paulhauner commented May 15, 2023

paulhauner commented Apr 26, 2023 •

edited

Loading