-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make init mask lazy for fully initialized/uninitialized const allocations #109670
Conversation
Avoid materializing bits in the InitMask bitset when a single value would be enough: when the mask represents a fully initialized or fully uninitialized const allocation.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit a2be7a1 with merge 89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846... |
Changing the layout of the InitMask changed the const allocations' hashes.
The try build completed a while ago. @rust-timer build 89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846 |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri |
Awesome! I've been meaning to write this optimization for a while. |
Just for kicks, how much improvement do you get on this kind of code: fn main() {
for _ in 0..4 {
let a = [0u8; 1024 * 1024 * 1024];
drop(&a[..]);
}
} EDIT: Not sure if it's this or the |
I remember the results from your example in #93215 I recently looked at const ROWS: usize = 100000;
const COLS: usize = 10000;
static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];
fn main() {} emitting metadata went from 17.7s to 16.1s (and from 78s to 55s with assertions turned on ^^ -- there are quite expensive ones in the init mask's bit search functions). |
Awesome! Thanks for digging into it and I'm looking forward to your mentioned follow up branches @bors r+ |
compiler/rustc_middle/src/mir/interpret/allocation/init_mask.rs
Outdated
Show resolved
Hide resolved
Move tests and limit the init mask's structures/fields visibility.
Yeah I think that's better, thanks! @bors r=oli-obk |
☀️ Test successful - checks-actions |
Finished benchmarking commit (8679208): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
There are a few optimization opportunities in the
InitMask
and related constAllocation
s (e.g. by taking advantage of the fact that it's a bitset that represents initialization, which is often entirely initialized or uninitialized in a single call, or gradually built up, etc).There's a few overwrites to the same state, multiple writes in a row to the same indices, the RLE scheme for
memcpy
doesn't always compress, etc.Here, we start with:
memcpy
sThis should be most visible on benchmarks and crates where const allocations dominate the runtime (like
ctfe-stress-5
of course), but I was especially looking at the worst cases from #93215.This first change allows the majority of
set_range
calls to stay with a lazy init mask when bootstrapping rustc (not that the init mask is a big part of the process in cpu time or memory usage).r? @oli-obk
I have another in-progress branch where I'll switch the singular initialized/uninitialized value to a watermark, recording the point after which everything is uninitialized. That will take care of cases where full initialization is monotonic and done in multiple steps (e.g. an array of a type without padding), which should then allow the vast majority of const allocations' init masks to stay lazy during bootstrapping (though interestingly I've seen such gradual initialization in both left-to-right and right-to-left directions, and I don't think a single watermark can handle both).