Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make init mask lazy for fully initialized/uninitialized const allocations #109670

Merged
merged 5 commits into from
Mar 29, 2023

Conversation

lqd
Copy link
Member

@lqd lqd commented Mar 27, 2023

There are a few optimization opportunities in the InitMask and related const Allocations (e.g. by taking advantage of the fact that it's a bitset that represents initialization, which is often entirely initialized or uninitialized in a single call, or gradually built up, etc).

There's a few overwrites to the same state, multiple writes in a row to the same indices, the RLE scheme for memcpy doesn't always compress, etc.

Here, we start with:

  • avoiding materializing the bitset's blocks if the allocation is fully initialized/uninitialized
  • dealloc blocks when fully overwriting, including when participating in memcpys
  • take care of the fixme about allocating blocks of 0s before overwriting them to the expected value
  • expanding unit test coverage of the init mask

This should be most visible on benchmarks and crates where const allocations dominate the runtime (like ctfe-stress-5 of course), but I was especially looking at the worst cases from #93215.

This first change allows the majority of set_range calls to stay with a lazy init mask when bootstrapping rustc (not that the init mask is a big part of the process in cpu time or memory usage).

r? @oli-obk

I have another in-progress branch where I'll switch the singular initialized/uninitialized value to a watermark, recording the point after which everything is uninitialized. That will take care of cases where full initialization is monotonic and done in multiple steps (e.g. an array of a type without padding), which should then allow the vast majority of const allocations' init masks to stay lazy during bootstrapping (though interestingly I've seen such gradual initialization in both left-to-right and right-to-left directions, and I don't think a single watermark can handle both).

lqd added 3 commits March 27, 2023 15:52
Avoid materializing bits in the InitMask bitset when a single value
would be enough: when the mask represents a fully initialized or fully
uninitialized const allocation.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 27, 2023
@lqd
Copy link
Member Author

lqd commented Mar 27, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 27, 2023
@bors
Copy link
Contributor

bors commented Mar 27, 2023

⌛ Trying commit a2be7a1 with merge 89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846...

Changing the layout of the InitMask changed the const
allocations' hashes.
@lqd
Copy link
Member Author

lqd commented Mar 27, 2023

The try build completed a while ago.

@rust-timer build 89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (89b1ad9f82f52b6fb2a2fe1ac2f139b2e8240846): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.3% [-27.8%, -1.4%] 16
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.7% [0.4%, 2.6%] 14
Regressions ❌
(secondary)
3.7% [2.1%, 4.9%] 5
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.7% [-6.5%, -5.3%] 3
All ❌✅ (primary) 1.7% [0.4%, 2.6%] 14

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-17.3% [-17.5%, -17.2%] 2
All ❌✅ (primary) - - 0

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 27, 2023
@lqd lqd marked this pull request as ready for review March 27, 2023 23:13
@rustbot
Copy link
Collaborator

rustbot commented Mar 27, 2023

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@saethlin
Copy link
Member

Awesome! I've been meaning to write this optimization for a while.

@saethlin
Copy link
Member

saethlin commented Mar 27, 2023

Just for kicks, how much improvement do you get on this kind of code:

fn main() {
    for _ in 0..4 {
        let a = [0u8; 1024 * 1024 * 1024];
        drop(&a[..]);
    }
}

EDIT: Not sure if it's this or the Vec::with_capacity version, but one of those is a stress test for this code path

@lqd
Copy link
Member Author

lqd commented Mar 28, 2023

I remember the results from your example in #93215 I recently looked at

const ROWS: usize = 100000;
const COLS: usize = 10000;

static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];

fn main() {}

emitting metadata went from 17.7s to 16.1s (and from 78s to 55s with assertions turned on ^^ -- there are quite expensive ones in the init mask's bit search functions).

@oli-obk
Copy link
Contributor

oli-obk commented Mar 28, 2023

Awesome! Thanks for digging into it and I'm looking forward to your mentioned follow up branches

@bors r+

@bors
Copy link
Contributor

bors commented Mar 28, 2023

📌 Commit 9f16a81 has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 28, 2023
Move tests and limit the init mask's structures/fields visibility.
@RalfJung
Copy link
Member

Yeah I think that's better, thanks!

@bors r=oli-obk

@bors
Copy link
Contributor

bors commented Mar 29, 2023

📌 Commit a857ba2 has been approved by oli-obk

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Mar 29, 2023

⌛ Testing commit a857ba2 with merge 8679208...

@bors
Copy link
Contributor

bors commented Mar 29, 2023

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 8679208 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 29, 2023
@bors bors merged commit 8679208 into rust-lang:master Mar 29, 2023
@rustbot rustbot added this to the 1.70.0 milestone Mar 29, 2023
@lqd lqd deleted the init-mask branch March 29, 2023 16:48
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8679208): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.1% [-27.7%, -1.4%] 17
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.8% [3.6%, 4.0%] 2
Regressions ❌
(secondary)
4.2% [3.3%, 5.2%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.0% [-6.3%, -2.6%] 7
All ❌✅ (primary) 3.8% [3.6%, 4.0%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-17.1% [-17.4%, -16.9%] 2
All ❌✅ (primary) - - 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants