Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fuzz mode to stress unaligned wasm addresses #3516

Merged
merged 1 commit into from
Nov 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 114 additions & 33 deletions crates/fuzzing/src/generators.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,29 +12,10 @@ pub mod api;

pub mod table_ops;

use anyhow::Result;
use arbitrary::{Arbitrary, Unstructured};

/// A description of configuration options that we should do differential
/// testing between.
#[derive(Arbitrary, Clone, Debug, PartialEq, Eq, Hash)]
pub struct DifferentialConfig {
opt_level: OptLevel,
force_jump_veneers: bool,
}

impl DifferentialConfig {
/// Convert this differential fuzzing config into a `wasmtime::Config`.
pub fn to_wasmtime_config(&self) -> anyhow::Result<wasmtime::Config> {
let mut config = crate::fuzz_default_config(wasmtime::Strategy::Cranelift)?;
config.cranelift_opt_level(self.opt_level.to_wasmtime());
if self.force_jump_veneers {
unsafe {
config.cranelift_flag_set("wasmtime_linkopt_force_jump_veneer", "true")?;
}
}
Ok(config)
}
}
use std::sync::Arc;
use wasmtime::{LinearMemory, MemoryCreator, MemoryType};

#[derive(Arbitrary, Clone, Debug, PartialEq, Eq, Hash)]
enum OptLevel {
Expand All @@ -54,40 +35,140 @@ impl OptLevel {
}

/// Implementation of generating a `wasmtime::Config` arbitrarily
#[derive(Arbitrary, Debug)]
#[derive(Arbitrary, Debug, Eq, Hash, PartialEq)]
pub struct Config {
opt_level: OptLevel,
debug_info: bool,
canonicalize_nans: bool,
interruptable: bool,
#[allow(missing_docs)]
pub consume_fuel: bool,
memory_config: MemoryConfig,
force_jump_veneers: bool,
}

// Note that we use 32-bit values here to avoid blowing the 64-bit address
// space by requesting ungodly-large sizes/guards.
static_memory_maximum_size: Option<u32>,
static_memory_guard_size: Option<u32>,
dynamic_memory_guard_size: Option<u32>,
guard_before_linear_memory: bool,
#[derive(Arbitrary, Debug, Eq, Hash, PartialEq)]
enum MemoryConfig {
/// Configuration for linear memories which correspond to normal
/// configuration settings in `wasmtime` itself. This will tweak various
/// parameters about static/dynamic memories.
///
/// Note that we use 32-bit values here to avoid blowing the 64-bit address
/// space by requesting ungodly-large sizes/guards.
Normal {
static_memory_maximum_size: Option<u32>,
static_memory_guard_size: Option<u32>,
dynamic_memory_guard_size: Option<u32>,
guard_before_linear_memory: bool,
},

/// Configuration to force use of a linear memory that's unaligned at its
/// base address to force all wasm addresses to be unaligned at the hardware
/// level, even if the wasm itself correctly aligns everything internally.
CustomUnaligned,
Comment on lines +65 to +68
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an interesting question here: do we need to always pessimize (simd) loads/stores to work with unaligned memory?

Ideally, we would have a fast path and a slow path, emit the fast path when the Wasm says it is correctly aligned, and the slow path otherwise. The fast path would then have a fallback (probably via signal handlers) if the load/store wasn't actually aligned, like the Wasm said it would be.

However, if the memory itself is unaligned, then the "aligned" loads/stores in Wasm are unaligned at the native level, meaning we would always hit the fast path's fallback, which is presumably even worse than just using slow path loads/stores.

So allowing unaligned memories potentially invalidates our speculative fast paths for loads/stores. I guess that is fine, because no one should ever give us unaligned memories in practice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question! In practice, at least on modern x86 cores, unaligned SSE loads have the same performance as aligned ones (zero penalty on Haswell and later, IIRC, which is the 2013 core); and on earlier cores, IIRC, it split into two uops (or two issues of one uop, I don't remember), so at worst it costs "one extra ALU instruction". An alignment check is at least that (AND rAddr, 1/3/7/15) and a conditional branch; the branch doesn't enter the BTB if never taken, but still costs fetch bandwidth. So I think it's always better to use aligned loads/stores (movups rather than movaps) unconditionally.

I'm not sure about aarch64; maybe @akirilov-arm or @sparker-arm can say more? I suspect at least on some smaller cores it might be an interesting question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex mentioned that he saw crashes on the aarch64 machine with this change, so I assume it is something we can't rely on modern x86 behavior with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll let him provide more details)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I've always been under the impression that unaligned things are "just as fast" nowadays and the overhead and angst of correctly implementing "fixup the operation in a signal handler" is quite far from being worth it.

To clarify though I haven't seen any new actual crashes with this fuzzer. I ran locally for a bit with simd enable on both x64 and aarch64 but no crashes that were interesting (just bugs in me writing this new fuzz stuff).

My main goal of this sort of fuzzer is to weed out codegen issues like #2943 with more regularity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just read a bit more about this and it seems that aarch64 does indeed support unaligned vector loads as well, architecturally (and tested just now with a little asm on my RPi4 doing a LDR qN, [xN] with a pointer ending in ...1). I think for simplicity I'd prefer to avoid the two-path solution here, as (on thinking through implications to codegen a bit more) a CFG diamond at every load/store would (i) significantly increase code size, (ii) slow down any CFG-sensitive analyses, e.g. liveness and branch splitting in regalloc, and (iii) most likely add measurable runtime overhead. We can definitely think about this more if we have to support an architecture that forces us into this, IMHO...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining we wouldn’t actually emit code to do checks (so no cfg diamond), just catch unaligned access faults in signal handlers and then jump to slow path stubs from there, but it seems I misunderstood Alex and we don’t actually even support any architectures where this is an issue, so I think we can continue ignoring it for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @cfallin said, support for unaligned data accesses (not instruction fetches) to normal memory is required by the 64-bit Arm architecture and can be toggled at runtime by privileged software (e.g. the operating system kernel), but in practice all environments we target enable it; in fact, some functionality in the system libraries like memcpy() may rely on it. Vector operations are not an exception, though we could use the structure load LD1, for instance, which requires alignment on the element size (not the vector size). As for performance, it depends on the particular implementation, but usually there is no penalty unless the access crosses a coarser-grained boundary such as 16 or 64 bytes.

There is one major exception to this rule - atomic and exclusive operations. Would this fuzz mode be applied to code that uses the Wasm threads proposal (which I think is the only case that would result in the generation of those instructions)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah my assumption was that we would disable this mode of fuzzing once the threads proposal was implemented. I agree that I don't think we can get away with misaligning the host memory once atomics come into play.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or we could just make sure that we never combine threads and unaligned memories in the fuzzer, rather than disabling it entirely.)

}

impl Config {
/// Converts this to a `wasmtime::Config` object
pub fn to_wasmtime(&self) -> wasmtime::Config {
let mut cfg = crate::fuzz_default_config(wasmtime::Strategy::Auto).unwrap();
cfg.debug_info(self.debug_info)
.static_memory_maximum_size(self.static_memory_maximum_size.unwrap_or(0).into())
.static_memory_guard_size(self.static_memory_guard_size.unwrap_or(0).into())
.dynamic_memory_guard_size(self.dynamic_memory_guard_size.unwrap_or(0).into())
.guard_before_linear_memory(self.guard_before_linear_memory)
.cranelift_nan_canonicalization(self.canonicalize_nans)
.cranelift_opt_level(self.opt_level.to_wasmtime())
.interruptable(self.interruptable)
.consume_fuel(self.consume_fuel);

if self.force_jump_veneers {
unsafe {
cfg.cranelift_flag_set("wasmtime_linkopt_force_jump_veneer", "true")
.unwrap();
}
}

match &self.memory_config {
MemoryConfig::Normal {
static_memory_maximum_size,
static_memory_guard_size,
dynamic_memory_guard_size,
guard_before_linear_memory,
} => {
cfg.static_memory_maximum_size(static_memory_maximum_size.unwrap_or(0).into())
.static_memory_guard_size(static_memory_guard_size.unwrap_or(0).into())
.dynamic_memory_guard_size(dynamic_memory_guard_size.unwrap_or(0).into())
.guard_before_linear_memory(*guard_before_linear_memory);
}
MemoryConfig::CustomUnaligned => {
cfg.with_host_memory(Arc::new(UnalignedMemoryCreator))
.static_memory_maximum_size(0)
.dynamic_memory_guard_size(0)
.static_memory_guard_size(0)
.guard_before_linear_memory(false);
}
}
return cfg;
}
}

struct UnalignedMemoryCreator;

unsafe impl MemoryCreator for UnalignedMemoryCreator {
fn new_memory(
&self,
_ty: MemoryType,
minimum: usize,
maximum: Option<usize>,
reserved_size_in_bytes: Option<usize>,
guard_size_in_bytes: usize,
) -> Result<Box<dyn LinearMemory>, String> {
assert_eq!(guard_size_in_bytes, 0);
assert!(reserved_size_in_bytes.is_none() || reserved_size_in_bytes == Some(0));
Ok(Box::new(UnalignedMemory {
src: vec![0; minimum + 1],
maximum,
}))
}
}

/// A custom "linear memory allocator" for wasm which only works with the
/// "dynamic" mode of configuration where wasm always does explicit bounds
/// checks.
///
/// This memory attempts to always use unaligned host addresses for the base
/// address of linear memory with wasm. This means that all jit loads/stores
/// should be unaligned, which is a "big hammer way" of testing that all our JIT
/// code works with unaligned addresses since alignment is not required for
/// correctness in wasm itself.
struct UnalignedMemory {
/// This memory is always one byte larger than the actual size of linear
/// memory.
src: Vec<u8>,
maximum: Option<usize>,
}

unsafe impl LinearMemory for UnalignedMemory {
fn byte_size(&self) -> usize {
// Chop off the extra byte reserved for the true byte size of this
// linear memory.
self.src.len() - 1
}

fn maximum_byte_size(&self) -> Option<usize> {
self.maximum
}

fn grow_to(&mut self, new_size: usize) -> Result<()> {
// Make sure to allocate an extra byte for our "unalignment"
self.src.resize(new_size + 1, 0);
Ok(())
}

fn as_ptr(&self) -> *mut u8 {
// Return our allocated memory, offset by one, so that the base address
// of memory is always unaligned.
self.src[1..].as_ptr() as *mut _
}
}

include!(concat!(env!("OUT_DIR"), "/spectests.rs"));

/// A spec test from the upstream wast testsuite, arbitrarily chosen from the
Expand Down
38 changes: 13 additions & 25 deletions crates/fuzzing/src/oracles.rs
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ pub fn compile(wasm: &[u8], strategy: Strategy) {
/// we call the exported functions for all of our different configs.
pub fn differential_execution(
module: &crate::generators::GeneratedModule,
configs: &[crate::generators::DifferentialConfig],
configs: &[crate::generators::Config],
) {
use std::collections::{HashMap, HashSet};

Expand All @@ -252,18 +252,13 @@ pub fn differential_execution(
return;
}

let configs: Vec<_> = match configs.iter().map(|c| c.to_wasmtime_config()).collect() {
Ok(cs) => cs,
// If the config is trying to use something that was turned off at
// compile time just continue to the next fuzz input.
Err(_) => return,
};

let configs: Vec<_> = configs.iter().map(|c| (c.to_wasmtime(), c)).collect();
let mut export_func_results: HashMap<String, Result<Box<[Val]>, Trap>> = Default::default();
let wasm = module.module.to_bytes();
log_wasm(&wasm);

for mut config in configs {
for (mut config, fuzz_config) in configs {
log::debug!("fuzz config: {:?}", fuzz_config);
// Disable module linking since it isn't enabled by default for
// `GeneratedModule` but is enabled by default for our fuzz config.
// Since module linking is currently a breaking change this is required
Expand All @@ -272,6 +267,9 @@ pub fn differential_execution(

let engine = Engine::new(&config).unwrap();
let mut store = create_store(&engine);
if fuzz_config.consume_fuel {
store.add_fuel(u64::max_value()).unwrap();
}

let module = Module::new(&engine, &wasm).unwrap();

Expand All @@ -293,10 +291,7 @@ pub fn differential_execution(
})
.collect::<Vec<_>>();
for (name, f) in exports {
// Always call the hang limit initializer first, so that we don't
// infinite loop when calling another export.
init_hang_limit(&mut store, instance);

log::debug!("invoke export {:?}", name);
let ty = f.ty(&store);
let params = dummy::dummy_values(ty.params());
let mut results = vec![Val::I32(0); ty.results().len()];
Expand All @@ -312,17 +307,6 @@ pub fn differential_execution(
}
}

fn init_hang_limit<T>(store: &mut Store<T>, instance: Instance) {
match instance.get_export(&mut *store, "hangLimitInitializer") {
None => return,
Some(Extern::Func(f)) => {
f.call(store, &[], &mut [])
.expect("initializing the hang limit should not fail");
}
Some(_) => panic!("unexpected hangLimitInitializer export"),
}
}

fn assert_same_export_func_result(
lhs: &Result<Box<[Val]>, Trap>,
rhs: &Result<Box<[Val]>, Trap>,
Expand All @@ -337,7 +321,11 @@ pub fn differential_execution(
};

match (lhs, rhs) {
(Err(_), Err(_)) => {}
(Err(a), Err(b)) => {
if a.trap_code() != b.trap_code() {
fail();
}
}
(Ok(lhs), Ok(rhs)) => {
if lhs.len() != rhs.len() {
fail();
Expand Down
4 changes: 2 additions & 2 deletions fuzz/fuzz_targets/differential.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ use libfuzzer_sys::fuzz_target;
use wasmtime_fuzzing::{generators, oracles};

fuzz_target!(|data: (
generators::DifferentialConfig,
generators::DifferentialConfig,
generators::Config,
generators::Config,
generators::GeneratedModule,
)| {
let (lhs, rhs, mut wasm) = data;
Expand Down