Introduce verification of integer address type widths. #10209

iwanders · 2025-02-08T18:53:25Z

This should ensure that the verifier correctly detects situations where an incorrectly sized integer is passed as an address type. This ensures that issues like these are caught at the verifier level instead of by asserts in the machine instruction generation.

Filing as a draft PR and making some comments throughout the diff for discussion.

Edit, I recommend starting review with the cranelift/filetests/filetests/verifier/pointer_width_32.clif file.

…dealliance#10118) This adds a check to load that confirms the pointer width is as expected according to the target.

Also clarify width of integer address type and unit tests that check the new verifier rule.

Makes unit test pass with the verifier checking the load address size.

Also adds unit tests to ensure problematic cases are detected.

iwanders · 2025-02-08T18:54:48Z

cranelift/codegen/meta/src/gen_inst.rs

@@ -1233,6 +1233,9 @@ fn gen_builder(

        There is also a method per instruction format. These methods all
        return an `Inst`.
+
+        When an integer address type is specified, this integer size is
+        required to be equal to the platform's pointer width.


This was stated in this comment, I think it's good to put this in the automatically created documentation as well?

Yes, I agree; let's say "when an address is specified to a load or store is specified" to (i) avoid using an otherwise-undefined/ambiguous concept "integer address type" (do we have float address types? is it the address of an integer in memory?); and (ii) make it clear that this has to do with memory accesses.

I dropped one is specified from your suggestion and made it into:

When an address to a load or store is specified, its integer size is required to be equal to the platform's pointer width.

the 'its integer size' still feels a bit off, but hopefully it conveys that the address is represented as an integer?

Happy to adjust further to ensure we get it right!

iwanders · 2025-02-08T18:55:49Z

cranelift/codegen/src/verifier/mod.rs

@@ -657,11 +657,61 @@ impl<'a> Verifier<'a> {
                }
            }


This match statement is currently not ordered alphabetically, so I kept all the changes at the bottom.

iwanders · 2025-02-08T18:57:06Z

cranelift/codegen/src/verifier/mod.rs

+            } => {
+                if opcode.can_store() {
+                    self.verify_is_address(inst, p, errors)?;
+                }


I was a bit surprised that there's an atomic_store in the instruction builder, but it seems to just go through the normal Store instruction.

Atomic stores are separate opcodes so I think this is fine?

Yes, I think so 👍

iwanders · 2025-02-08T19:04:18Z

cranelift/codegen/src/verifier/mod.rs

+            }
+            Store {
+                opcode,
+                args: [_, p],


I initially assumed p was the first value of (undocumented) Store was the pointer, just like the other operations. Only after I checked store did I realise it uses the second argument as the pointer. I wonder if the capitalised flavours should be hidden from the documentation?

That's somewhat surprising, but I think this is probably outside the scope of this PR -- ideally we'd unify the arg order, not try to selectively hide documentation.

Hmm, so my confusion was that store and Store exist, but only store has useful docs and parameter names. Clicking Source for store shows:

/// Store ``x`` to memory at ``p + Offset``. /// /// This is a polymorphic instruction that can store any value type with a /// memory representation. /// /// Inputs: /// /// - MemFlags: Memory operation flags /// - x: Value to be stored /// - p: An integer address type /// - Offset: Byte offset from base address #[allow(non_snake_case)] fn store<T1: Into<ir::MemFlags>, T2: Into<ir::immediates::Offset32>>(self, MemFlags: T1, x: ir::Value, p: ir::Value, Offset: T2) -> Inst { let MemFlags = MemFlags.into(); let Offset = Offset.into(); let ctrl_typevar = self.data_flow_graph().value_type(x); self.Store(Opcode::Store, ctrl_typevar, MemFlags, Offset, x, p).0 }

Which calls into the uppercase Store, I needed to look at this generated code to confirm that the second argument (arg1) of Store is the address, it's not the end of the world, because the unit tests just failed the other way around, and most people that need to know the order are likely spelunking in the code anyway?

I see now that the lowercase store does determine the ctrl_typevar, so store and Store do different things, I glossed over that initially, maybe Store should just get more extensive documentation at some point, or link to store for it. Anyways, definitely beyond the scope of this PR, but hopefully this clarifies my train of thought as an outsider :)

Yeah, the distinction is that Store names an instruction format, and store-the-opcode is one of the instructions that uses Store-the-format. Perhaps we could clean this up later, thanks.

iwanders · 2025-02-08T19:08:48Z

cranelift/filetests/filetests/runtests/fdemote.clif

@@ -73,21 +73,3 @@ block0(v0: f64):
 ; run: %fdemote_is_nan(-sNaN:0x1) == 1
 ; run: %fdemote_is_nan(+sNaN:0x4000000000001) == 1
 ; run: %fdemote_is_nan(-sNaN:0x4000000000001) == 1
-


Most of the changes from the unit tests are very mechanical. I kept as many test cases in the existing file, and split out 32 and 64 bit-specific unit tests to fdemote_32.clif and fdemote_64.clif respectively.

Most often the change is the number in the stack_addr.<size> function I think... this could definitely do with a bit of scrutiny as I'm pretty inexperienced with these very low level constructs.

iwanders · 2025-02-08T19:09:09Z

cranelift/filetests/filetests/runtests/fdemote.clif

-    v4 = load.f64 v3
-    v5 = fdemote.f32 v4
-    return v5
-}


The v1 input argument to this block seems to be unused, I did make that platform specific.

iwanders · 2025-02-08T19:13:51Z

cranelift/filetests/filetests/verifier/pointer_width_32.clif

@@ -0,0 +1,61 @@
+test verifier


I introduced this file, and its 64 bit counterpart to check the verifier is now able to detect these bad cases. Most are adapted from functions in the existing clif files.

I recommend opening this file and its counterpart in two tabs and switching between them, they're identical save for 64 and 32 swapped.

I hope this covers all the cases we'd like to check for.

iwanders · 2025-02-12T22:34:59Z

I've moved this out of draft to ensure it gets a reviewer, I couldn't figure out how to assign @cfallin to review it while it was a draft :)

cfallin · 2025-02-13T00:03:51Z

@iwanders it's on my queue, thanks! I'm attending the Wasm standards group in-person meeting this week so I'll likely get to review this next week or end of this week.

iwanders · 2025-02-13T00:18:52Z

Ah perfect @cfallin, thanks, I just wanted to make sure it wasn't in limbo. I have some travel coming up at the end of next week myself, so may go quiet for a bit myself as well.

cfallin

Thanks -- I think this is good incremental progress and I think we should land it, given my minor nits below are addressed!

I have a few thoughts after seeing the diff (which you diligently worked out in the shape we requested, so these are just thoughts on the shape of the problem, not your work) that could lead to some followup cleanups:

I wonder if there's a better way to handle stackslot addresses to avoid the test duplication. You had mentioned a separate pointer type before, which we've resisted so far in order to avoid a lot of complexity and duplication. Perhaps we can have a surface-level syntax sugar in the textual CLIF parser, though: stack_addr.ptr rather than stack_addr.{i32,i64}. What do you think?
Perhaps we should have a helper on InstructionData that is something like fn addr(&self) -> Option<Value>, returning the address operand, if any, of the instruction; so we can centralize that and don't have to enumerate the cases in the verifier.

cfallin · 2025-02-13T18:06:10Z

cranelift/codegen/meta/src/gen_inst.rs

@@ -1233,6 +1233,9 @@ fn gen_builder(

        There is also a method per instruction format. These methods all
        return an `Inst`.
+
+        When an integer address type is specified, this integer size is
+        required to be equal to the platform's pointer width.


Yes, I agree; let's say "when an address is specified to a load or store is specified" to (i) avoid using an otherwise-undefined/ambiguous concept "integer address type" (do we have float address types? is it the address of an integer in memory?); and (ii) make it clear that this has to do with memory accesses.

cfallin · 2025-02-13T18:08:26Z

cranelift/codegen/src/verifier/mod.rs

-                flags,
-                arg,
+            LoadNoOffset { opcode, flags, arg } => {
+                if opcode == Opcode::Bitcast {


Let's keep the original nested match here -- it's cleaner than ad-hoc ifs inside the match arm. Something like

LoadNoOffset { opcode: Opcode::Bitcast, flags, arg } => { /* original */ } LoadNoOffset { opcode, flags, arg } if opcode.can_load() => { /* verify address */ }

and likewise below: pull the ifs that guard with can_load()/can_store() into match guards.

I really like this feedback, this is something I'll adopt in my own code!

cfallin · 2025-02-13T18:09:35Z

cranelift/codegen/src/verifier/mod.rs

+            }
+            Store {
+                opcode,
+                args: [_, p],


That's somewhat surprising, but I think this is probably outside the scope of this PR -- ideally we'd unify the arg order, not try to selectively hide documentation.

cfallin · 2025-02-13T18:10:30Z

cranelift/codegen/src/verifier/mod.rs

+            } => {
+                if opcode.can_store() {
+                    self.verify_is_address(inst, p, errors)?;
+                }


Atomic stores are separate opcodes so I think this is fine?

cfallin · 2025-02-13T18:11:24Z

cranelift/codegen/src/verifier/mod.rs

+    ) -> VerifierStepResult {
+        if let Some(isa) = self.isa {
+            let pointer_width = isa.triple().pointer_width()?;
+            let value_type = &self.func.dfg.value_type(v);


No need to borrow here -- it's a Type which wraps a u16 internally (ie, it's tiny and Copy).

👍 removed, definitely not necessary if it's Copy and fits in a register.

iwanders · 2025-02-14T01:44:54Z

that could lead to some followup cleanups:

I currently don't have time to pursue these but here's my two cents :)

I wonder if there's a better way to handle stackslot addresses to avoid the test duplication. You had mentioned a separate pointer type before, which we've resisted so far in order to avoid a lot of complexity and duplication. Perhaps we can have a surface-level syntax sugar in the textual CLIF parser, though: stack_addr.ptr rather than stack_addr.{i32,i64}. What do you think?

I guess one could just instantiate both flavours for each function, and store them in separate fields, or in an enum type in the functions field that's currently in TestFile? But then it may also make sense to split the isa_spec field into the two flavours and group the respective isa's with the functions that were instantiated for them? It does make the struct more complex though.

Another option, which is not great because it splits files, but it's worth considering because it may allow more reuse. If one clif file can refer to another file, it's easy to do the syntactic sugar approach, for something like fdemote_32.clif:

test interpret
test run
target pulley32
target pulley32be
include fdemote.clif

For; fdemote_64.clif:

test interpret
test run
target pulley64
include fdemote.clif

The included file fdemote.cliff contains entries that use stack_addr.ptr.

When the reader reads files, it keeps track of which isa's it has seen, as long as it has only seen 32 bit isa's stack_addr.ptr would be i32, if it has only seen 64 bit isa's the value of stack_addr.ptr would be i64, if both isa's are seen and stack_addr.ptr is used, an error is raised. This also allows for reuse of functions defined in one file by other files, if that would be convenient (I don't know if any of the current tests would benefit from that).

The current situation of splitting out some files is not that problematic though, it is simple and straightforward. I wouldn't be surprised if I'm in the first person to run into this thing as I used that example from the documentation AND used a debug build to hit that assert. Personally I'd lean towards not doing anything on this front until either the split files become a burden or other reasons arise to pursue a change, at least, I don't see a solution that reduces complexity.

Perhaps we should have a helper on InstructionData that is something like fn addr(&self) -> Option, returning the address operand, if any, of the instruction; so we can centralize that and don't have to enumerate the cases in the verifier.

The Option<Value> is a bit of a limiting interface though, because it implies that each function has zero or one address operand, the stack_switch has three, so then you'd end up with having to make a decision which of the arguments is the address, for most it may work though, or change it to a Vec? Thinking about the generalization here, this would allow InstructionData to 'opt in' to an address verifier check, I wonder if there would be a way to generalise this further and allow the instructions to specify which checks should be ran on its arguments... Not sure if this would just 'move code' instead of 'reduce code' though, I don't know enough about cranelift to answer that :)

cfallin

Updates look good, thanks!

cfallin · 2025-02-15T02:21:23Z

cranelift/codegen/src/verifier/mod.rs

+            }
+            Store {
+                opcode,
+                args: [_, p],


Yeah, the distinction is that Store names an instruction format, and store-the-opcode is one of the instructions that uses Store-the-format. Perhaps we could clean this up later, thanks.

cfallin · 2025-02-15T02:24:37Z

[suggestions about CLIF and textual inclusion]

Hmm, interesting; I think all things considered we probably don't want to build that much infrastructure. I'll keep kicking around the idea of a pointer-specific type that resolves early and maybe we can iterate on it later if the test duplication increases significantly.

it implies that each function has zero or one address operand, the stack_switch has three

Ah, right! That's a relatively recent addition; I had forgotten about that. Anyway, not a critical refactor either way, so fine to leave it how it is for now I think.

iwanders added 3 commits February 8, 2025 11:24

cranelift/codegen/verifier: Add verification of pointer width (byteco…

76679be

…dealliance#10118) This adds a check to load that confirms the pointer width is as expected according to the target.

cranelift/verifier: Add load pointer width verification.

3c1c066

Also clarify width of integer address type and unit tests that check the new verifier rule.

cranelift/filetests: Split out 64 and 32 bit loads.

85d1399

Makes unit test pass with the verifier checking the load address size.

iwanders force-pushed the issue-10118-improve-verifier-for-load-store branch from c2a9612 to f391249 Compare February 8, 2025 19:11

cranelift/verifier: Add address verification to more instructions.

76e6222

Also adds unit tests to ensure problematic cases are detected.

iwanders force-pushed the issue-10118-improve-verifier-for-load-store branch from f391249 to 76e6222 Compare February 8, 2025 19:14

iwanders commented Feb 8, 2025

View reviewed changes

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:meta Everything related to the meta-language. labels Feb 8, 2025

iwanders marked this pull request as ready for review February 12, 2025 22:34

iwanders requested a review from a team as a code owner February 12, 2025 22:34

iwanders requested review from abrown and removed request for a team February 12, 2025 22:34

abrown requested review from cfallin and removed request for abrown February 13, 2025 18:02

cfallin approved these changes Feb 13, 2025

View reviewed changes

cranelift/verifier: Change wording, restructure arms.

4291fad

cfallin approved these changes Feb 15, 2025

View reviewed changes

cfallin added this pull request to the merge queue Feb 15, 2025

Merged via the queue into bytecodealliance:main with commit 9afc64b Feb 15, 2025
69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce verification of integer address type widths. #10209

Introduce verification of integer address type widths. #10209

iwanders commented Feb 8, 2025 •

edited

Loading

iwanders Feb 8, 2025

cfallin Feb 13, 2025

iwanders Feb 14, 2025

iwanders Feb 8, 2025

iwanders Feb 8, 2025

cfallin Feb 13, 2025

iwanders Feb 14, 2025

iwanders Feb 8, 2025

cfallin Feb 13, 2025

iwanders Feb 14, 2025

cfallin Feb 15, 2025

iwanders Feb 8, 2025

iwanders Feb 8, 2025

iwanders Feb 8, 2025

iwanders commented Feb 12, 2025

cfallin commented Feb 13, 2025

iwanders commented Feb 13, 2025

cfallin left a comment

cfallin Feb 13, 2025

cfallin Feb 13, 2025

iwanders Feb 14, 2025

cfallin Feb 13, 2025

cfallin Feb 13, 2025

cfallin Feb 13, 2025

iwanders Feb 14, 2025

iwanders commented Feb 14, 2025

cfallin left a comment

cfallin Feb 15, 2025

cfallin commented Feb 15, 2025

Introduce verification of integer address type widths. #10209

Introduce verification of integer address type widths. #10209

Conversation

iwanders commented Feb 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iwanders commented Feb 12, 2025

cfallin commented Feb 13, 2025

iwanders commented Feb 13, 2025

cfallin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iwanders commented Feb 14, 2025

cfallin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfallin commented Feb 15, 2025

iwanders commented Feb 8, 2025 •

edited

Loading