-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable MatchBranchSimplification #112001
Enable MatchBranchSimplification #112001
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 4a46d791044568189a3087b08ef918f97b4f179d with merge 915cd328c0e78ffd700e8e29dca4916629ebdcef... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (915cd328c0e78ffd700e8e29dca4916629ebdcef): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 643.928s -> 643.82s (-0.02%) |
4a46d79
to
a770220
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit a770220fd1b41a35ddfffa79b328730e4b030302 with merge 31cb4817bc7bb15715807f7109f203feacd9abce... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (31cb4817bc7bb15715807f7109f203feacd9abce): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 644.897s -> 644.188s (-0.11%) |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
Re-reading the pass, I found a possible bug. We do not check that For instance: https://godbolt.org/z/dba95Y84P Perf reports a slight improvement. |
a770220
to
7e04c93
Compare
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (f59d577): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 644.831s -> 647.057s (0.35%) |
That's definitely not what I expected. I wouldn't have suggested we merge that if I got a perf report like that before merge. I'll try to look into this today? |
The codegen for The perf regression is only in icount, no other metrics look concerning. So I think this is probably fine to leave as-is but I'm going to keep investigating because it seems plausible that this could have moved our generated code a bit in the wrong direction. |
There are two call sites for |
…tmcm Enable ScalarReplacementOfAggregates in optimized builds Like MatchBranchSimplification, this pass is known to produce significant runtime improvements in Cranelift artifacts, and I believe based on the perf runs here that the primary effect of this pass is to empower MatchBranchSimplification. ScalarReplacementOfAggregates on its own has little effect on anything, but when this was rebased up to include rust-lang#112001 we started seeing significant and majority-positive results. Based on the fact that we see most of the regressions in debug builds (rust-lang#112002 (comment)) and some rather significant ones in cycles and wall time, I'm only enabling this in optimized builds at the moment.
Enable ScalarReplacementOfAggregates in optimized builds Like MatchBranchSimplification, this pass is known to produce significant runtime improvements in Cranelift artifacts, and I believe based on the perf runs here that the primary effect of this pass is to empower MatchBranchSimplification. ScalarReplacementOfAggregates on its own has little effect on anything, but when this was rebased up to include rust-lang/rust#112001 we started seeing significant and majority-positive results. Based on the fact that we see most of the regressions in debug builds (rust-lang/rust#112002 (comment)) and some rather significant ones in cycles and wall time, I'm only enabling this in optimized builds at the moment.
Enable ScalarReplacementOfAggregates in optimized builds Like MatchBranchSimplification, this pass is known to produce significant runtime improvements in Cranelift artifacts, and I believe based on the perf runs here that the primary effect of this pass is to empower MatchBranchSimplification. ScalarReplacementOfAggregates on its own has little effect on anything, but when this was rebased up to include rust-lang/rust#112001 we started seeing significant and majority-positive results. Based on the fact that we see most of the regressions in debug builds (rust-lang/rust#112002 (comment)) and some rather significant ones in cycles and wall time, I'm only enabling this in optimized builds at the moment.
Enable ScalarReplacementOfAggregates in optimized builds Like MatchBranchSimplification, this pass is known to produce significant runtime improvements in Cranelift artifacts, and I believe based on the perf runs here that the primary effect of this pass is to empower MatchBranchSimplification. ScalarReplacementOfAggregates on its own has little effect on anything, but when this was rebased up to include rust-lang/rust#112001 we started seeing significant and majority-positive results. Based on the fact that we see most of the regressions in debug builds (rust-lang/rust#112002 (comment)) and some rather significant ones in cycles and wall time, I'm only enabling this in optimized builds at the moment.
This pass is one of the small number of benefits from
-Zmir-opt-level=3
that has motivated rustc_codegen_cranelift to use it:rust/compiler/rustc_codegen_cranelift/build_system/build_sysroot.rs
Lines 244 to 246 in 19ed0aa
Cranelift's motivation for this is runtime performance improvements in debug builds. Lifting this pass all the way to
-Zmir-opt-level=1
seems to come without significant perf overhead, so that's what I'm suggesting here.