You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's a bug lurking in the database code that can cause occasional database corruption. It doesn't happen consistently, but when it does it seems to result in a zero hash (0x00) appearing in the randao_mixes array.
The case I'm investigating presented as corruption at slot 135168 on Prater:
Oddly the first corrupt state actually occurs much earlier. I found that the state at slot 12288 was corrupt using this (fish) script:
for i in (seq 0 2048 135168)
set checksum (curl -s-H"Accept: application/octet-stream""http://localhost:5052/eth/v2/debug/beacon/states/$i"| sha256sum)
echo"$i: $checksum"end
Diffing the corrupt state at slot 12288 against the real state reveals a 0x00 value in the randao_mixes at index 320. This is interesting because that corresponds to epoch 320, i.e. 64 epochs prior to slot 12288 (epoch 384).
I think the bug must be in store_updated_vector, which is responsible for writing the randao mixes in the flat format used by the database:
// Start by iterating backwards from the last chunk, storing new chunks in the database.
// Stop once a chunk in the database matches what we were about to store, this indicates
// that a previously stored state has already filled-in a portion of the indices covered.
let full_range_checked = store_range(
field,
(start_cindex..=end_cindex).rev(),
start_vindex,
end_vindex,
store,
state,
spec,
ops,
)?;
// If the previous `store_range` did not check the entire range, it may be the case that the
// state's vector includes elements at low vector indices that are not yet stored in the
// database, so run another `store_range` to ensure these values are also stored.
if !full_range_checked {
store_range(
field,
start_cindex..end_cindex,
start_vindex,
end_vindex,
store,
state,
spec,
ops,
)?;
}
Ok(())
}
It's possible that we're somehow re-writing the old state at 12288 which inapproriately zeroes some entries and corrupts all subsequent states. I don't think the corruption can occur the first time state 12288 is written else it would have failed the block root check at that point or shortly after.
Will update this issue with more info soon.
The text was updated successfully, but these errors were encountered:
Description
There's a bug lurking in the database code that can cause occasional database corruption. It doesn't happen consistently, but when it does it seems to result in a zero hash (0x00) appearing in the
randao_mixes
array.The case I'm investigating presented as corruption at slot 135168 on Prater:
Oddly the first corrupt state actually occurs much earlier. I found that the state at slot 12288 was corrupt using this (fish) script:
Diffing the corrupt state at slot 12288 against the real state reveals a 0x00 value in the
randao_mixes
at index 320. This is interesting because that corresponds to epoch 320, i.e. 64 epochs prior to slot 12288 (epoch 384).I think the bug must be in
store_updated_vector
, which is responsible for writing the randao mixes in the flat format used by the database:lighthouse/beacon_node/store/src/chunked_vector.rs
Lines 334 to 383 in 0177b92
It's possible that we're somehow re-writing the old state at 12288 which inapproriately zeroes some entries and corrupts all subsequent states. I don't think the corruption can occur the first time state 12288 is written else it would have failed the block root check at that point or shortly after.
Will update this issue with more info soon.
The text was updated successfully, but these errors were encountered: