Make init mask lazy for fully initialized/uninitialized const allocations #109670

lqd · 2023-03-27T17:12:51Z

There are a few optimization opportunities in the InitMask and related const Allocations (e.g. by taking advantage of the fact that it's a bitset that represents initialization, which is often entirely initialized or uninitialized in a single call, or gradually built up, etc).

There's a few overwrites to the same state, multiple writes in a row to the same indices, the RLE scheme for memcpy doesn't always compress, etc.

Here, we start with:

avoiding materializing the bitset's blocks if the allocation is fully initialized/uninitialized
dealloc blocks when fully overwriting, including when participating in memcpys
take care of the fixme about allocating blocks of 0s before overwriting them to the expected value
expanding unit test coverage of the init mask

This should be most visible on benchmarks and crates where const allocations dominate the runtime (like ctfe-stress-5 of course), but I was especially looking at the worst cases from #93215.

This first change allows the majority of set_range calls to stay with a lazy init mask when bootstrapping rustc (not that the init mask is a big part of the process in cpu time or memory usage).

r? @oli-obk

I have another in-progress branch where I'll switch the singular initialized/uninitialized value to a watermark, recording the point after which everything is uninitialized. That will take care of cases where full initialization is monotonic and done in multiple steps (e.g. an array of a type without padding), which should then allow the vast majority of const allocations' init masks to stay lazy during bootstrapping (though interestingly I've seen such gradual initialization in both left-to-right and right-to-left directions, and I don't think a single watermark can handle both).

Avoid materializing bits in the InitMask bitset when a single value would be enough: when the mask represents a fully initialized or fully uninitialized const allocation.

lqd · 2023-03-27T17:19:47Z

@bors try @rust-timer queue

bors · 2023-03-27T17:19:56Z

⌛ Trying commit a2be7a1 with merge 89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846...

Changing the layout of the InitMask changed the const allocations' hashes.

lqd · 2023-03-27T20:59:36Z

The try build completed a while ago.

@rust-timer build 89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846

rust-timer · 2023-03-27T22:24:27Z

Finished benchmarking commit (89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-5.3%	[-27.8%, -1.4%]	16
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.7%	[0.4%, 2.6%]	14
Regressions ❌ (secondary)	3.7%	[2.1%, 4.9%]	5
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-5.7%	[-6.5%, -5.3%]	3
All ❌✅ (primary)	1.7%	[0.4%, 2.6%]	14

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.2%	[2.2%, 2.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-17.3%	[-17.5%, -17.2%]	2
All ❌✅ (primary)	-	-	0

tests/codegen/consts.rs

rustbot · 2023-03-27T23:14:09Z

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

saethlin · 2023-03-27T23:16:33Z

Awesome! I've been meaning to write this optimization for a while.

saethlin · 2023-03-27T23:18:08Z

Just for kicks, how much improvement do you get on this kind of code:

fn main() {
    for _ in 0..4 {
        let a = [0u8; 1024 * 1024 * 1024];
        drop(&a[..]);
    }
}

EDIT: Not sure if it's this or the Vec::with_capacity version, but one of those is a stress test for this code path

lqd · 2023-03-28T00:09:00Z

I remember the results from your example in #93215 I recently looked at

const ROWS: usize = 100000;
const COLS: usize = 10000;

static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];

fn main() {}

emitting metadata went from 17.7s to 16.1s (and from 78s to 55s with assertions turned on ^^ -- there are quite expensive ones in the init mask's bit search functions).

oli-obk · 2023-03-28T07:50:22Z

Awesome! Thanks for digging into it and I'm looking forward to your mentioned follow up branches

@bors r+

bors · 2023-03-28T07:50:24Z

📌 Commit 9f16a81 has been approved by oli-obk

It is now in the queue for this repository.

compiler/rustc_middle/src/mir/interpret/allocation/init_mask.rs

Move tests and limit the init mask's structures/fields visibility.

RalfJung · 2023-03-29T11:26:17Z

Yeah I think that's better, thanks!

@bors r=oli-obk

bors · 2023-03-29T11:26:21Z

📌 Commit a857ba2 has been approved by oli-obk

It is now in the queue for this repository.

bors · 2023-03-29T12:44:03Z

⌛ Testing commit a857ba2 with merge 8679208...

bors · 2023-03-29T16:20:29Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 8679208 to master...

rust-timer · 2023-03-29T17:39:27Z

Finished benchmarking commit (8679208): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-5.1%	[-27.7%, -1.4%]	17
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.8%	[3.6%, 4.0%]	2
Regressions ❌ (secondary)	4.2%	[3.3%, 5.2%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.0%	[-6.3%, -2.6%]	7
All ❌✅ (primary)	3.8%	[3.6%, 4.0%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-17.1%	[-17.4%, -16.9%]	2
All ❌✅ (primary)	-	-	0

lqd added 3 commits March 27, 2023 15:52

make InitMask lazy for fully init/uninit cases

3f80529

Avoid materializing bits in the InitMask bitset when a single value would be enough: when the mask represents a fully initialized or fully uninitialized const allocation.

add more InitMask test coverage

a696420

readability tweaks

a2be7a1

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 27, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 27, 2023

update codegen test expectations

9f16a81

Changing the layout of the InitMask changed the const allocations' hashes.

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 27, 2023

rustbot assigned oli-obk Mar 27, 2023

lqd commented Mar 27, 2023

View reviewed changes

tests/codegen/consts.rs Show resolved Hide resolved

lqd marked this pull request as ready for review March 27, 2023 23:13

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 28, 2023

RalfJung reviewed Mar 28, 2023

View reviewed changes

compiler/rustc_middle/src/mir/interpret/allocation/init_mask.rs Outdated Show resolved Hide resolved

address review comments

a857ba2

Move tests and limit the init mask's structures/fields visibility.

bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 29, 2023

bors merged commit 8679208 into rust-lang:master Mar 29, 2023

rustbot added this to the 1.70.0 milestone Mar 29, 2023

lqd deleted the init-mask branch March 29, 2023 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make init mask lazy for fully initialized/uninitialized const allocations #109670

Make init mask lazy for fully initialized/uninitialized const allocations #109670

lqd commented Mar 27, 2023 •

edited

Loading

lqd commented Mar 27, 2023

This comment has been minimized.

bors commented Mar 27, 2023

lqd commented Mar 27, 2023

This comment has been minimized.

rust-timer commented Mar 27, 2023

rustbot commented Mar 27, 2023

saethlin commented Mar 27, 2023

saethlin commented Mar 27, 2023 •

edited

Loading

lqd commented Mar 28, 2023

oli-obk commented Mar 28, 2023

bors commented Mar 28, 2023

RalfJung commented Mar 29, 2023

bors commented Mar 29, 2023

bors commented Mar 29, 2023

bors commented Mar 29, 2023

rust-timer commented Mar 29, 2023

Make init mask lazy for fully initialized/uninitialized const allocations #109670

Make init mask lazy for fully initialized/uninitialized const allocations #109670

Conversation

lqd commented Mar 27, 2023 • edited Loading

lqd commented Mar 27, 2023

This comment has been minimized.

bors commented Mar 27, 2023

lqd commented Mar 27, 2023

This comment has been minimized.

rust-timer commented Mar 27, 2023

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

rustbot commented Mar 27, 2023

saethlin commented Mar 27, 2023

saethlin commented Mar 27, 2023 • edited Loading

lqd commented Mar 28, 2023

oli-obk commented Mar 28, 2023

bors commented Mar 28, 2023

RalfJung commented Mar 29, 2023

bors commented Mar 29, 2023

bors commented Mar 29, 2023

bors commented Mar 29, 2023

rust-timer commented Mar 29, 2023

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

lqd commented Mar 27, 2023 •

edited

Loading

saethlin commented Mar 27, 2023 •

edited

Loading