Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

tiered storage: Footer deserialization has undefined behavior/needs sanitization #34121

Closed
brooksprumo opened this issue Nov 16, 2023 · 1 comment · Fixed by #34200 or #34310
Closed

Comments

@brooksprumo
Copy link
Contributor

brooksprumo commented Nov 16, 2023

Problem

The TieredStorageFooter is written-to and read-from disk by simply copying bytes. Since the footer contains enums, there are implicit invalid bit patterns for the footer. In Rust, it is undefined behavior to access uninitialized bytes, which I believe extends to invalid bit patterns, a la bool. We should only access initialized—and well-defined—bytes. Otherwise this would allow corrupt on-disk footers, potentially creating corrupt/invalid snapshots.

Here's the footer:

#[derive(Debug, PartialEq, Eq, Clone)]
#[repr(C)]
pub struct TieredStorageFooter {
// formats
/// The format of the account meta entry.
pub account_meta_format: AccountMetaFormat,
/// The format of the owners block.
pub owners_block_format: OwnersBlockFormat,
/// The format of the account index block.
pub index_block_format: IndexBlockFormat,
/// The format of the account block.
pub account_block_format: AccountBlockFormat,
// Account-block related
/// The number of account entries.
pub account_entry_count: u32,
/// The size of each account meta entry in bytes.
pub account_meta_entry_size: u32,
/// The default size of an account block before compression.
///
/// If the size of one account (meta + data + optional fields) before
/// compression is bigger than this number, than it is considered a
/// blob account and it will have its own account block.
pub account_block_size: u64,
// Owner-related
/// The number of owners.
pub owner_count: u32,
/// The size of each owner entry.
pub owner_entry_size: u32,
// Offsets
// Note that offset to the account blocks is omitted as it's always 0.
/// The offset pointing to the first byte of the account index block.
pub index_block_offset: u64,
/// The offset pointing to the first byte of the owners block.
pub owners_block_offset: u64,
// account range
/// The smallest account address in this file.
pub min_account_address: Pubkey,
/// The largest account address in this file.
pub max_account_address: Pubkey,
/// A hash that represents a tiered accounts file for consistency check.
pub hash: Hash,
// The below fields belong to footer tail.
// The sum of their sizes should match FOOTER_TAIL_SIZE.
/// The size of the footer including the magic number.
pub footer_size: u64,
/// The format version of the tiered accounts file.
pub format_version: u64,
// This field is persisted in the storage but not in this struct.
// The number should match FOOTER_MAGIC_NUMBER.
// pub magic_number: u64,
}

And the enums:

#[repr(u16)]
#[derive(
Clone,
Copy,
Debug,
Default,
Eq,
Hash,
PartialEq,
num_enum::IntoPrimitive,
num_enum::TryFromPrimitive,
)]
pub enum AccountMetaFormat {
#[default]
Hot = 0,
// Temporarily comment out to avoid unimplemented!() block
// Cold = 1,
}
#[repr(u16)]
#[derive(
Clone,
Copy,
Debug,
Default,
Eq,
Hash,
PartialEq,
num_enum::IntoPrimitive,
num_enum::TryFromPrimitive,
)]
pub enum AccountBlockFormat {
#[default]
AlignedRaw = 0,
Lz4 = 1,
}
#[repr(u16)]
#[derive(
Clone,
Copy,
Debug,
Default,
Eq,
Hash,
PartialEq,
num_enum::IntoPrimitive,
num_enum::TryFromPrimitive,
)]
pub enum OwnersBlockFormat {
#[default]
LocalIndex = 0,
}

Proposed Solution

We probably need to read and write the enums as u16s, and then perform sanitation on those values to ensure they're valid values for the enums.

Note: Tiered storage is not live/in-use yet, so we don't have any attack vectors for a malicious actor.

@brooksprumo
Copy link
Contributor Author

cc/ @yhchiang-sol

@brooksprumo brooksprumo changed the title Serialization/Deserialization has undefined behavior/needs sanitization tiered storage: Serialization/Deserialization has undefined behavior/needs sanitization Nov 16, 2023
@brooksprumo brooksprumo changed the title tiered storage: Serialization/Deserialization has undefined behavior/needs sanitization tiered storage: Footer deserialization has undefined behavior/needs sanitization Dec 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
1 participant