Emit `trunc nuw` for unchecked shifts and `to_immediate_scalar` #137058

scottmcm · 2025-02-15T04:26:56Z

For shifts this shrinks the IR by no longer needing an assume while still providing the UB information
Having this on the i8→i1 truncations will hopefully help with some places that have to load i8s or pass those in LLVM structs without range information

rustbot · 2025-02-15T04:27:05Z

rustbot has assigned @Noratrieb.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2025-02-15T04:27:06Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

nikic · 2025-02-15T08:54:19Z

compiler/rustc_codegen_llvm/src/builder.rs

+        let trunc = self.trunc(val, dest_ty);
+        if llvm_util::get_version() >= (19, 0, 0) {
+            unsafe {
+                if llvm::LLVMIsATruncInst(trunc).is_some() {


Checking LLVMIsAInstruction would be fine as well, don't really need to export the extra API.

@nikic The assertions in #137058 (comment) are why I was using LLVMIsATruncInst here. Am I doing something wrong, and LLVMIsAInstruction should work, or should I go back to checking LLVMIsATruncInst?

Okay, that probably means that you have some cases with a trunc to the same type. In that case, you will end up trying to set the flag on an unrelated instruction (producing a crash with LLVMIsAInstruction -- with LLVMIsATruncInst it might end up setting the flag on an unrelated trunc).

You'll want to check for the no-op trunc case and bail out early.

Thanks, you were 100% correct -- extract field was calling to_immediate_scalar on things where that had already been called, resulting in no-op i1→i1 truncates.

tests/codegen/unchecked_shifts.rs

nikic · 2025-02-15T08:56:52Z

@bors try @rust-timer queue

Emit `trunc nuw` for unchecked shifts and `to_immediate_scalar` - For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information - Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information

bors · 2025-02-15T08:58:02Z

⌛ Trying commit 79891a0 with merge 70adb00...

bors · 2025-02-15T10:48:01Z

☀️ Try build successful - checks-actions
Build commit: 70adb00 (70adb00341b574a0ec80325546b6653c0e47c460)

rust-timer · 2025-02-15T12:03:52Z

Finished benchmarking commit (70adb00): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.3%	[-2.3%, -2.3%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.3%	[-2.3%, -2.3%]	1

Cycles

Results (secondary -7.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-7.4%	[-7.9%, -6.9%]	6
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 788.977s -> 789.501s (0.07%)
Artifact size: 347.33 MiB -> 347.37 MiB (0.01%)

scottmcm · 2025-02-15T18:28:34Z

LLVMIsAInstruction is asserting for me locally, so let's try the LLVM-19 jobs to see if it's just me or not:
@bors try

bors · 2025-02-15T18:29:44Z

⌛ Trying commit 6629a40 with merge 7f462bb...

Emit `trunc nuw` for unchecked shifts and `to_immediate_scalar` - For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information - Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information try-job: x86_64-gnu-llvm-19-1 try-job: x86_64-gnu-llvm-19-2 try-job: x86_64-gnu-llvm-19-3

bors · 2025-02-15T20:30:03Z

☀️ Try build successful - checks-actions
Build commit: 7f462bb (7f462bb604e024e270c17d4929aadce7796695f4)

bors · 2025-02-19T09:31:43Z

📌 Commit 0fb9186 has been approved by nikic

It is now in the queue for this repository.

bors · 2025-02-19T16:54:18Z

⌛ Testing commit 0fb9186 with merge 2cfc21b...

Emit `trunc nuw` for unchecked shifts and `to_immediate_scalar` - For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information - Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information

bors · 2025-02-19T17:33:25Z

💔 Test failed - checks-actions

- For shifts this shrinks the IR by no longer needing an `assume` while still providing the UB information - Having this on the `i8`→`i1` truncations will hopefully help with some places that have to load `i8`s or pass those in LLVM structs without range information

…ar` on things which are already immediates That means it stops trying to truncate things that are already `i1`s.

scottmcm · 2025-02-19T20:04:40Z

Oh, fun, the ABI changed on me 😆

Rebased (https://github.com/rust-lang/rust/compare/0fb9186828763b475a11764ad34dded172ae6b90..cc5ef80bc63fbf7ac6a4dcff0ea107e06d5e0172) then updated the codegen test (https://github.com/rust-lang/rust/compare/cc5ef80bc63fbf7ac6a4dcff0ea107e06d5e0172..6f9cfd694d67ad24af6c7e2235a2da5d22918df0) accordingly.

No other code changes, but waiting on CI.

scottmcm · 2025-02-19T21:34:30Z

@bors r=nikic

bors · 2025-02-19T21:34:34Z

📌 Commit 6f9cfd6 has been approved by nikic

It is now in the queue for this repository.

bors · 2025-02-20T09:05:26Z

⌛ Testing commit 6f9cfd6 with merge c62239a...

bors · 2025-02-20T12:18:08Z

☀️ Test successful - checks-actions
Approved by: nikic
Pushing c62239a to master...

rust-timer · 2025-02-20T14:36:28Z

Finished benchmarking commit (c62239a): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

If the regression was expected or you think it can be justified,
please write a comment with sufficient written justification, and add
@rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
If you think that you know of a way to resolve the regression, try to create
a new PR with a fix for the regression.
If you do not understand the regression or you think that it is just noise,
you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.7%	[0.4%, 1.3%]	16
Improvements ✅ (primary)	-0.5%	[-0.8%, -0.4%]	4
Improvements ✅ (secondary)	-0.1%	[-0.1%, -0.1%]	1
All ❌✅ (primary)	-0.5%	[-0.8%, -0.4%]	4

Max RSS (memory usage)

Results (primary -1.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.7%	[1.7%, 1.7%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.5%	[-3.3%, -1.7%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.1%	[-3.3%, 1.7%]	3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary -0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.0%	[-0.0%, -0.0%]	10
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.0%	[-0.0%, -0.0%]	10

Bootstrap: 772.938s -> 773.474s (0.07%)
Artifact size: 360.27 MiB -> 360.26 MiB (-0.00%)

rylev · 2025-02-25T20:16:04Z

@scottmcm all the regressions are in the coercions benchmark which I would expect to see stressed by changes like this one. Do you think this warrants further investigation? Usually small changes in stress tests don't necessarily lead to perf investigations.

scottmcm · 2025-02-26T03:05:02Z

@rylev I think I'm recovering the regressions to coercions in #137513 (comment)

rustbot assigned Noratrieb Feb 15, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 15, 2025

nikic reviewed Feb 15, 2025

View reviewed changes