-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit trunc nuw
for unchecked shifts and to_immediate_scalar
#137058
Conversation
r? @Noratrieb rustbot has assigned @Noratrieb. Use |
Some changes occurred in compiler/rustc_codegen_gcc |
let trunc = self.trunc(val, dest_ty); | ||
if llvm_util::get_version() >= (19, 0, 0) { | ||
unsafe { | ||
if llvm::LLVMIsATruncInst(trunc).is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking LLVMIsAInstruction would be fine as well, don't really need to export the extra API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic The assertions in #137058 (comment) are why I was using LLVMIsATruncInst
here. Am I doing something wrong, and LLVMIsAInstruction
should work, or should I go back to checking LLVMIsATruncInst
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that probably means that you have some cases with a trunc to the same type. In that case, you will end up trying to set the flag on an unrelated instruction (producing a crash with LLVMIsAInstruction -- with LLVMIsATruncInst it might end up setting the flag on an unrelated trunc).
You'll want to check for the no-op trunc case and bail out early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, you were 100% correct -- extract field was calling to_immediate_scalar
on things where that had already been called, resulting in no-op i1
→i1
truncates.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Emit `trunc nuw` for unchecked shifts and `to_immediate_scalar` - For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information - Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (70adb00): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -2.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary -7.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 788.977s -> 789.501s (0.07%) |
This comment was marked as outdated.
This comment was marked as outdated.
|
Emit `trunc nuw` for unchecked shifts and `to_immediate_scalar` - For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information - Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information try-job: x86_64-gnu-llvm-19-1 try-job: x86_64-gnu-llvm-19-2 try-job: x86_64-gnu-llvm-19-3
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
9d8c8b0
to
a00aac1
Compare
This comment has been minimized.
This comment has been minimized.
a00aac1
to
9ac28af
Compare
This comment has been minimized.
This comment has been minimized.
9ac28af
to
a8277f2
Compare
This comment has been minimized.
This comment has been minimized.
a8277f2
to
2253451
Compare
This comment has been minimized.
This comment has been minimized.
Emit `trunc nuw` for unchecked shifts and `to_immediate_scalar` - For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information - Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
- For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information - Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information
0fb9186
to
cc5ef80
Compare
…ar` on things which are already immediates That means it stops trying to truncate things that are already `i1`s.
cc5ef80
to
6f9cfd6
Compare
Oh, fun, the ABI changed on me 😆 Rebased (https://github.com/rust-lang/rust/compare/0fb9186828763b475a11764ad34dded172ae6b90..cc5ef80bc63fbf7ac6a4dcff0ea107e06d5e0172) then updated the codegen test (https://github.com/rust-lang/rust/compare/cc5ef80bc63fbf7ac6a4dcff0ea107e06d5e0172..6f9cfd694d67ad24af6c7e2235a2da5d22918df0) accordingly. No other code changes, but waiting on CI. |
@bors r=nikic |
☀️ Test successful - checks-actions |
Finished benchmarking commit (c62239a): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary -0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 772.938s -> 773.474s (0.07%) |
@scottmcm all the regressions are in the coercions benchmark which I would expect to see stressed by changes like this one. Do you think this warrants further investigation? Usually small changes in stress tests don't necessarily lead to perf investigations. |
@rylev I think I'm recovering the regressions to |
assume
while still providing the UB informationi8
→i1
truncations will hopefully help with some places that have to loadi8
s or pass those in LLVM structs without range information