-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce verification of integer address type widths. #10209
Introduce verification of integer address type widths. #10209
Conversation
…dealliance#10118) This adds a check to load that confirms the pointer width is as expected according to the target.
Also clarify width of integer address type and unit tests that check the new verifier rule.
Makes unit test pass with the verifier checking the load address size.
c2a9612
to
f391249
Compare
Also adds unit tests to ensure problematic cases are detected.
f391249
to
76e6222
Compare
@@ -1233,6 +1233,9 @@ fn gen_builder( | |||
|
|||
There is also a method per instruction format. These methods all | |||
return an `Inst`. | |||
|
|||
When an integer address type is specified, this integer size is | |||
required to be equal to the platform's pointer width. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was stated in this comment, I think it's good to put this in the automatically created documentation as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree; let's say "when an address is specified to a load or store is specified" to (i) avoid using an otherwise-undefined/ambiguous concept "integer address type" (do we have float address types? is it the address of an integer in memory?); and (ii) make it clear that this has to do with memory accesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped one is specified
from your suggestion and made it into:
When an address to a load or store is specified, its integer
size is required to be equal to the platform's pointer width.
the 'its integer size' still feels a bit off, but hopefully it conveys that the address is represented as an integer?
Happy to adjust further to ensure we get it right!
@@ -657,11 +657,61 @@ impl<'a> Verifier<'a> { | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This match statement is currently not ordered alphabetically, so I kept all the changes at the bottom.
} => { | ||
if opcode.can_store() { | ||
self.verify_is_address(inst, p, errors)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit surprised that there's an atomic_store in the instruction builder, but it seems to just go through the normal Store
instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic stores are separate opcodes so I think this is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so 👍
} | ||
Store { | ||
opcode, | ||
args: [_, p], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's somewhat surprising, but I think this is probably outside the scope of this PR -- ideally we'd unify the arg order, not try to selectively hide documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so my confusion was that store
and Store
exist, but only store has useful docs and parameter names. Clicking Source
for store
shows:
/// Store ``x`` to memory at ``p + Offset``.
///
/// This is a polymorphic instruction that can store any value type with a
/// memory representation.
///
/// Inputs:
///
/// - MemFlags: Memory operation flags
/// - x: Value to be stored
/// - p: An integer address type
/// - Offset: Byte offset from base address
#[allow(non_snake_case)]
fn store<T1: Into<ir::MemFlags>, T2: Into<ir::immediates::Offset32>>(self, MemFlags: T1, x: ir::Value, p: ir::Value, Offset: T2) -> Inst {
let MemFlags = MemFlags.into();
let Offset = Offset.into();
let ctrl_typevar = self.data_flow_graph().value_type(x);
self.Store(Opcode::Store, ctrl_typevar, MemFlags, Offset, x, p).0
}
Which calls into the uppercase Store, I needed to look at this generated code to confirm that the second argument (arg1
) of Store
is the address, it's not the end of the world, because the unit tests just failed the other way around, and most people that need to know the order are likely spelunking in the code anyway?
I see now that the lowercase store
does determine the ctrl_typevar
, so store
and Store
do different things, I glossed over that initially, maybe Store
should just get more extensive documentation at some point, or link to store
for it. Anyways, definitely beyond the scope of this PR, but hopefully this clarifies my train of thought as an outsider :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the distinction is that Store
names an instruction format, and store
-the-opcode is one of the instructions that uses Store
-the-format. Perhaps we could clean this up later, thanks.
@@ -73,21 +73,3 @@ block0(v0: f64): | |||
; run: %fdemote_is_nan(-sNaN:0x1) == 1 | |||
; run: %fdemote_is_nan(+sNaN:0x4000000000001) == 1 | |||
; run: %fdemote_is_nan(-sNaN:0x4000000000001) == 1 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the changes from the unit tests are very mechanical. I kept as many test cases in the existing file, and split out 32 and 64 bit-specific unit tests to fdemote_32.clif
and fdemote_64.clif
respectively.
Most often the change is the number in the stack_addr.<size>
function I think... this could definitely do with a bit of scrutiny as I'm pretty inexperienced with these very low level constructs.
v4 = load.f64 v3 | ||
v5 = fdemote.f32 v4 | ||
return v5 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The v1
input argument to this block seems to be unused, I did make that platform specific.
@@ -0,0 +1,61 @@ | |||
test verifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced this file, and its 64 bit counterpart to check the verifier is now able to detect these bad cases. Most are adapted from functions in the existing clif files.
I recommend opening this file and its counterpart in two tabs and switching between them, they're identical save for 64 and 32 swapped.
I hope this covers all the cases we'd like to check for.
I've moved this out of draft to ensure it gets a reviewer, I couldn't figure out how to assign @cfallin to review it while it was a draft :) |
@iwanders it's on my queue, thanks! I'm attending the Wasm standards group in-person meeting this week so I'll likely get to review this next week or end of this week. |
Ah perfect @cfallin, thanks, I just wanted to make sure it wasn't in limbo. I have some travel coming up at the end of next week myself, so may go quiet for a bit myself as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- I think this is good incremental progress and I think we should land it, given my minor nits below are addressed!
I have a few thoughts after seeing the diff (which you diligently worked out in the shape we requested, so these are just thoughts on the shape of the problem, not your work) that could lead to some followup cleanups:
- I wonder if there's a better way to handle stackslot addresses to avoid the test duplication. You had mentioned a separate pointer type before, which we've resisted so far in order to avoid a lot of complexity and duplication. Perhaps we can have a surface-level syntax sugar in the textual CLIF parser, though:
stack_addr.ptr
rather thanstack_addr.{i32,i64}
. What do you think? - Perhaps we should have a helper on
InstructionData
that is something likefn addr(&self) -> Option<Value>
, returning the address operand, if any, of the instruction; so we can centralize that and don't have to enumerate the cases in the verifier.
@@ -1233,6 +1233,9 @@ fn gen_builder( | |||
|
|||
There is also a method per instruction format. These methods all | |||
return an `Inst`. | |||
|
|||
When an integer address type is specified, this integer size is | |||
required to be equal to the platform's pointer width. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree; let's say "when an address is specified to a load or store is specified" to (i) avoid using an otherwise-undefined/ambiguous concept "integer address type" (do we have float address types? is it the address of an integer in memory?); and (ii) make it clear that this has to do with memory accesses.
flags, | ||
arg, | ||
LoadNoOffset { opcode, flags, arg } => { | ||
if opcode == Opcode::Bitcast { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the original nested match here -- it's cleaner than ad-hoc ifs inside the match arm. Something like
LoadNoOffset { opcode: Opcode::Bitcast, flags, arg } => { /* original */ }
LoadNoOffset { opcode, flags, arg } if opcode.can_load() => { /* verify address */ }
and likewise below: pull the ifs that guard with can_load()
/can_store()
into match guards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this feedback, this is something I'll adopt in my own code!
} | ||
Store { | ||
opcode, | ||
args: [_, p], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's somewhat surprising, but I think this is probably outside the scope of this PR -- ideally we'd unify the arg order, not try to selectively hide documentation.
} => { | ||
if opcode.can_store() { | ||
self.verify_is_address(inst, p, errors)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic stores are separate opcodes so I think this is fine?
) -> VerifierStepResult { | ||
if let Some(isa) = self.isa { | ||
let pointer_width = isa.triple().pointer_width()?; | ||
let value_type = &self.func.dfg.value_type(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to borrow here -- it's a Type
which wraps a u16 internally (ie, it's tiny and Copy
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 removed, definitely not necessary if it's Copy and fits in a register.
I currently don't have time to pursue these but here's my two cents :)
I guess one could just instantiate both flavours for each function, and store them in separate fields, or in an enum type in the Another option, which is not great because it splits files, but it's worth considering because it may allow more reuse. If one clif file can refer to another file, it's easy to do the syntactic sugar approach, for something like
For;
The included file When the reader reads files, it keeps track of which isa's it has seen, as long as it has only seen 32 bit isa's The current situation of splitting out some files is not that problematic though, it is simple and straightforward. I wouldn't be surprised if I'm in the first person to run into this thing as I used that example from the documentation AND used a debug build to hit that assert. Personally I'd lean towards not doing anything on this front until either the split files become a burden or other reasons arise to pursue a change, at least, I don't see a solution that reduces complexity.
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good, thanks!
} | ||
Store { | ||
opcode, | ||
args: [_, p], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the distinction is that Store
names an instruction format, and store
-the-opcode is one of the instructions that uses Store
-the-format. Perhaps we could clean this up later, thanks.
Hmm, interesting; I think all things considered we probably don't want to build that much infrastructure. I'll keep kicking around the idea of a pointer-specific type that resolves early and maybe we can iterate on it later if the test duplication increases significantly.
Ah, right! That's a relatively recent addition; I had forgotten about that. Anyway, not a critical refactor either way, so fine to leave it how it is for now I think. |
Resolves #10118, fyi @cfallin.
This should ensure that the verifier correctly detects situations where an incorrectly sized integer is passed as an address type. This ensures that issues like these are caught at the verifier level instead of by asserts in the machine instruction generation.
Filing as a draft PR and making some comments throughout the diff for discussion.
Edit, I recommend starting review with the
cranelift/filetests/filetests/verifier/pointer_width_32.clif
file.