-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fuzz mode to stress unaligned wasm addresses #3516
Merged
alexcrichton
merged 1 commit into
bytecodealliance:main
from
alexcrichton:fuzz-unaligned
Nov 15, 2021
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an interesting question here: do we need to always pessimize (simd) loads/stores to work with unaligned memory?
Ideally, we would have a fast path and a slow path, emit the fast path when the Wasm says it is correctly aligned, and the slow path otherwise. The fast path would then have a fallback (probably via signal handlers) if the load/store wasn't actually aligned, like the Wasm said it would be.
However, if the memory itself is unaligned, then the "aligned" loads/stores in Wasm are unaligned at the native level, meaning we would always hit the fast path's fallback, which is presumably even worse than just using slow path loads/stores.
So allowing unaligned memories potentially invalidates our speculative fast paths for loads/stores. I guess that is fine, because no one should ever give us unaligned memories in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question! In practice, at least on modern x86 cores, unaligned SSE loads have the same performance as aligned ones (zero penalty on Haswell and later, IIRC, which is the 2013 core); and on earlier cores, IIRC, it split into two uops (or two issues of one uop, I don't remember), so at worst it costs "one extra ALU instruction". An alignment check is at least that (AND rAddr, 1/3/7/15) and a conditional branch; the branch doesn't enter the BTB if never taken, but still costs fetch bandwidth. So I think it's always better to use aligned loads/stores (
movups
rather thanmovaps
) unconditionally.I'm not sure about aarch64; maybe @akirilov-arm or @sparker-arm can say more? I suspect at least on some smaller cores it might be an interesting question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alex mentioned that he saw crashes on the aarch64 machine with this change, so I assume it is something we can't rely on modern x86 behavior with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'll let him provide more details)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I've always been under the impression that unaligned things are "just as fast" nowadays and the overhead and angst of correctly implementing "fixup the operation in a signal handler" is quite far from being worth it.
To clarify though I haven't seen any new actual crashes with this fuzzer. I ran locally for a bit with simd enable on both x64 and aarch64 but no crashes that were interesting (just bugs in me writing this new fuzz stuff).
My main goal of this sort of fuzzer is to weed out codegen issues like #2943 with more regularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just read a bit more about this and it seems that aarch64 does indeed support unaligned vector loads as well, architecturally (and tested just now with a little asm on my RPi4 doing a
LDR qN, [xN]
with a pointer ending in...1
). I think for simplicity I'd prefer to avoid the two-path solution here, as (on thinking through implications to codegen a bit more) a CFG diamond at every load/store would (i) significantly increase code size, (ii) slow down any CFG-sensitive analyses, e.g. liveness and branch splitting in regalloc, and (iii) most likely add measurable runtime overhead. We can definitely think about this more if we have to support an architecture that forces us into this, IMHO...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining we wouldn’t actually emit code to do checks (so no cfg diamond), just catch unaligned access faults in signal handlers and then jump to slow path stubs from there, but it seems I misunderstood Alex and we don’t actually even support any architectures where this is an issue, so I think we can continue ignoring it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @cfallin said, support for unaligned data accesses (not instruction fetches) to normal memory is required by the 64-bit Arm architecture and can be toggled at runtime by privileged software (e.g. the operating system kernel), but in practice all environments we target enable it; in fact, some functionality in the system libraries like
memcpy()
may rely on it. Vector operations are not an exception, though we could use the structure loadLD1
, for instance, which requires alignment on the element size (not the vector size). As for performance, it depends on the particular implementation, but usually there is no penalty unless the access crosses a coarser-grained boundary such as 16 or 64 bytes.There is one major exception to this rule - atomic and exclusive operations. Would this fuzz mode be applied to code that uses the Wasm threads proposal (which I think is the only case that would result in the generation of those instructions)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah my assumption was that we would disable this mode of fuzzing once the threads proposal was implemented. I agree that I don't think we can get away with misaligning the host memory once atomics come into play.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Or we could just make sure that we never combine threads and unaligned memories in the fuzzer, rather than disabling it entirely.)