Fix bug in freezer DB storage of `randao_mixes` #3011

michaelsproul · 2022-02-09T01:27:33Z

Description

There's a bug lurking in the database code that can cause occasional database corruption. It doesn't happen consistently, but when it does it seems to result in a zero hash (0x00) appearing in the randao_mixes array.

The case I'm investigating presented as corruption at slot 135168 on Prater:

Feb 08 07:31:26.461 ERRO State reconstruction failed             error: HotColdDBError(BlockReplayBlockError(HeaderInvalid { reason: ParentBlockRootMismatch { state: 0x373eb699eae0110e474671cab72d5c6ca666d4a6f5a5a356f2af89039ad98382, block: 0xabf45deec98af2873a04d352ebbc54eac35d00c8157fea27b21f9adc2446233b } })), service: beacon

Oddly the first corrupt state actually occurs much earlier. I found that the state at slot 12288 was corrupt using this (fish) script:

for i in (seq 0 2048 135168)
    set checksum (curl -s -H "Accept: application/octet-stream" "http://localhost:5052/eth/v2/debug/beacon/states/$i" | sha256sum)
    echo "$i: $checksum"
end

Diffing the corrupt state at slot 12288 against the real state reveals a 0x00 value in the randao_mixes at index 320. This is interesting because that corresponds to epoch 320, i.e. 64 epochs prior to slot 12288 (epoch 384).

I think the bug must be in store_updated_vector, which is responsible for writing the randao mixes in the flat format used by the database:

lighthouse/beacon_node/store/src/chunked_vector.rs

Lines 334 to 383 in 0177b92

    
           pub fn store_updated_vector<F: Field<E>, E: EthSpec, S: KeyValueStore<E>>( 
        
               field: F, 
        
               store: &S, 
        
               state: &BeaconState<E>, 
        
               spec: &ChainSpec, 
        
               ops: &mut Vec<KeyValueStoreOp>, 
        
           ) -> Result<(), Error> { 
        
               let chunk_size = F::chunk_size(); 
        
               let (start_vindex, end_vindex) = F::start_and_end_vindex(state.slot(), spec); 
        
               let start_cindex = start_vindex / chunk_size; 
        
               let end_cindex = end_vindex / chunk_size; 
        
               // Store the genesis value if we have access to it, and it hasn't been stored already. 
        
               if F::slot_needs_genesis_value(state.slot(), spec) { 
        
                   let genesis_value = F::extract_genesis_value(state, spec)?; 
        
                   F::check_and_store_genesis_value(store, genesis_value, ops)?; 
        
               } 
        
               // Start by iterating backwards from the last chunk, storing new chunks in the database. 
        
               // Stop once a chunk in the database matches what we were about to store, this indicates 
        
               // that a previously stored state has already filled-in a portion of the indices covered. 
        
               let full_range_checked = store_range( 
        
                   field, 
        
                   (start_cindex..=end_cindex).rev(), 
        
                   start_vindex, 
        
                   end_vindex, 
        
                   store, 
        
                   state, 
        
                   spec, 
        
                   ops, 
        
               )?; 
        
               // If the previous `store_range` did not check the entire range, it may be the case that the 
        
               // state's vector includes elements at low vector indices that are not yet stored in the 
        
               // database, so run another `store_range` to ensure these values are also stored. 
        
               if !full_range_checked { 
        
                   store_range( 
        
                       field, 
        
                       start_cindex..end_cindex, 
        
                       start_vindex, 
        
                       end_vindex, 
        
                       store, 
        
                       state, 
        
                       spec, 
        
                       ops, 
        
                   )?; 
        
               } 
        
               Ok(()) 
        
           }

It's possible that we're somehow re-writing the old state at 12288 which inapproriately zeroes some entries and corrupts all subsequent states. I don't think the corruption can occur the first time state 12288 is written else it would have failed the block root check at that point or shortly after.

Will update this issue with more info soon.

The text was updated successfully, but these errors were encountered:

michaelsproul · 2024-08-19T07:44:16Z

Closing in favour of hierarchical state diffs, which deletes this part of the database 🎉

Hierarchical state diffs #5978

michaelsproul added bug Something isn't working database labels Feb 9, 2022

michaelsproul self-assigned this Feb 9, 2022

michaelsproul mentioned this issue Feb 27, 2022

Unable to sync beacon node after hard reset. #3044

Closed

michaelsproul mentioned this issue May 18, 2022

StateRootMismatch while syncing on windows #2134

Closed

michaelsproul mentioned this issue Aug 5, 2022

State Reconstruction reaches a faulty state #3433

Open

michaelsproul mentioned this issue Aug 26, 2022

Add freezer DB debugging tools #3511

Closed

This was referenced Jul 7, 2023

Add logic to prune all historic states #4481

Closed

[QUESTION] Curious case of state cache #4502

Closed

michaelsproul closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in freezer DB storage of `randao_mixes` #3011

Fix bug in freezer DB storage of `randao_mixes` #3011

michaelsproul commented Feb 9, 2022

michaelsproul commented Aug 19, 2024

Fix bug in freezer DB storage of randao_mixes #3011

Fix bug in freezer DB storage of randao_mixes #3011

Comments

michaelsproul commented Feb 9, 2022

Description

michaelsproul commented Aug 19, 2024

Fix bug in freezer DB storage of `randao_mixes` #3011

Fix bug in freezer DB storage of `randao_mixes` #3011