Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Box::default stack copies in debug mode #136089

Merged
merged 2 commits into from
Feb 21, 2025

Conversation

jwong101
Copy link
Contributor

The Box::new(T::default()) implementation of Box::default only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any MaybeUninit<T>'s and just writing
T directly to the Box pointer, the stack usage in debug mode remains
the same as the old version.

Another option would be to mark Box::write as #[inline(always)],
and change it's implementation to to avoid calling MaybeUninit::write
(which creates a MaybeUninit<T> on the stack) and to use ptr::write instead.

Fixes: #136043

The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.
@rustbot
Copy link
Collaborator

rustbot commented Jan 26, 2025

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 26, 2025
// extra stack copies of `T` in debug mode.
//
// See https://github.com/rust-lang/rust/issues/136043 for more context.
ptr::write(&raw mut *x as *mut T, T::default());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the stack frame of ptr::write also involve a stack allocation since it takes the argument by value? A more robust solution would be to use copy_nonoverlapping from the local in Box::default directly to the heap.

Copy link
Contributor Author

@jwong101 jwong101 Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the stack frame of ptr::write also involve a stack allocation since it takes the argument by value?

From what I've observed, ptr::write doesn't actually create a new alloca, and essentially just calls memcpy directly on the argument(assuming the argument is passed is big enough and not something like a usize). Godbolt. Let me check how we lower the ptr::write intrinsic to make sure of this though.

A more robust solution would be to use copy_nonoverlapping from the local in Box::default directly to the heap.

Unfortunately, we need to also mem::forget1 the actual value after copying it, or create a ManuallyDrop<T>. The mem::forget copy doesn't appear to be eliminated in debug mode, and ManuallyDrop::<T>::default() call copies T::default to an additional alloca on its stack frame. I created two additional functions in the Godbolt link to showcase this codegen behavior.

Footnotes

  1. Technically we can skip this if needs_drop::<T>() is false.

@saethlin
Copy link
Member

Another option would be to mark Box::write as #[inline(always)]

I advise against this, and I'm glad the PR doesn't do it. We currently inline inline(always) calls even in unoptimized builds, but the attribute is explicitly a hint that can be completely ignored. This inlining in debug builds can have runaway compile time cost.

@jwong101
Copy link
Contributor Author

jwong101 commented Jan 28, 2025

I was looking into where we do the optimization of not creating an alloca in ptr::write's stack frame in debug mode, and I found something weird.

Given the following code:

const NUM: usize = 1000000;

// Thing is not `Copy`, but has no drop glue
pub struct Thing([u8; NUM]);

#[no_mangle]
pub fn src(move_thing: fn(Thing), capture_arg: fn(&Thing), maybe_use_y: fn(), y: Thing) {
    capture_arg(&y);
    move_thing(y);
    maybe_use_y();
}

There's no alloca created in debug mode:

define void @src(ptr %move_thing, ptr %capture_arg, ptr %maybe_use_y, ptr align 1 %y) unnamed_addr {
start:
  %maybe_use_y.dbg.spill = alloca [8 x i8], align 8
  %capture_arg.dbg.spill = alloca [8 x i8], align 8
  %move_thing.dbg.spill = alloca [8 x i8], align 8
  store ptr %move_thing, ptr %move_thing.dbg.spill, align 8
  store ptr %capture_arg, ptr %capture_arg.dbg.spill, align 8
  store ptr %maybe_use_y, ptr %maybe_use_y.dbg.spill, align 8
  call void %capture_arg(ptr align 1 %y)
  call void %move_thing(ptr align 1 %y)
  call void %maybe_use_y()
  ret void
}

However, there is an alloca in Release mode:

define void @src(ptr nocapture noundef nonnull readonly %move_thing, ptr nocapture noundef nonnull readonly %capture_arg, ptr nocapture noundef nonnull readonly %maybe_use_y, ptr noalias nocapture noundef readonly align 1 dereferenceable(1000000) %y) unnamed_addr {
start:
  %0 = alloca [1000000 x i8], align 1
  tail call void %capture_arg(ptr noalias noundef nonnull readonly align 1 dereferenceable(1000000) %y)
  call void @llvm.lifetime.start.p0(i64 1000000, ptr nonnull %0)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(1000000) %0, ptr noundef nonnull align 1 dereferenceable(1000000) %y, i64 1000000, i1 false)
  call void %move_thing(ptr noalias nocapture noundef nonnull align 1 dereferenceable(1000000) %0)
  call void @llvm.lifetime.end.p0(i64 1000000, ptr nonnull %0)
  tail call void %maybe_use_y()
  ret void
}

I haven't been keeping up with Rust's opsem, however, the last time I checked, the legality1 of this was unresolved as per rust-lang/unsafe-code-guidelines#416 and rust-lang/unsafe-code-guidelines#188.

cc @saethlin since you're already in this thread and I think this is in your area of expertise

Footnotes

  1. Personally, I think that moving out of a local/argument should invalidate all previous borrows, unless there's a bunch of unsafe code that this breaks.

@saethlin
Copy link
Member

I don't follow your reasoning. You're looking at LLVM IR and trying to reason about MIR semantics, and that is not going to go well. The relevant MIR optimization that changes how arguments are passed is GVN, which turns a Move operand into a Copy operand: https://godbolt.org/z/sEWz399Mj

I also don't think this discussion belongs on this PR, if you want to ask opsem questions you should go to the Zulip.

@safinaskar
Copy link
Contributor

@rustbot label A-box

@rustbot rustbot added the A-box Area: Our favorite opsem complication label Feb 6, 2025
@Amanieu
Copy link
Member

Amanieu commented Feb 18, 2025

@bors r+

@bors
Copy link
Contributor

bors commented Feb 18, 2025

📌 Commit 9700567 has been approved by Amanieu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 18, 2025
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Feb 20, 2025
…sage, r=Amanieu

Reduce `Box::default` stack copies in debug mode

The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.

Another option would be to mark `Box::write` as `#[inline(always)]`,
and change it's implementation to to avoid calling `MaybeUninit::write`
(which creates a `MaybeUninit<T>` on the stack) and to use `ptr::write` instead.

Fixes: rust-lang#136043
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 20, 2025
Rollup of 12 pull requests

Successful merges:

 - rust-lang#128080 (Specify scope in `out_of_scope_macro_calls` lint)
 - rust-lang#135354 ([Debuginfo] Add MSVC Synthetic and Summary providers to LLDB)
 - rust-lang#135630 (add more `s390x` target features)
 - rust-lang#136089 (Reduce `Box::default` stack copies in debug mode)
 - rust-lang#136148 (Optionally add type names to `TypeId`s.)
 - rust-lang#137192 (Remove obsolete Windows ThinLTO+TLS workaround)
 - rust-lang#137204 (Clarify MIR dialects and phases)
 - rust-lang#137299 (Simplify `Postorder` customization.)
 - rust-lang#137302 (Use a probe to avoid registering stray region obligations when re-checking drops in MIR typeck)
 - rust-lang#137305 (Tweaks in and around `rustc_middle`)
 - rust-lang#137313 (Some codegen_llvm cleanups)
 - rust-lang#137333 (Use `edition = "2024"` in the compiler (redux))

r? `@ghost`
`@rustbot` modify labels: rollup

try-job: test-various
try-job: x86_64-msvc-1
try-job: x86_64-msvc-2
try-job: i686-msvc-1
try-job: i686-msvc-2
try-job: i686-mingw-1
try-job: i686-mingw-2
try-job: i686-mingw-3
try-job: x86_64-gnu-nopt
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Feb 20, 2025
…sage, r=Amanieu

Reduce `Box::default` stack copies in debug mode

The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.

Another option would be to mark `Box::write` as `#[inline(always)]`,
and change it's implementation to to avoid calling `MaybeUninit::write`
(which creates a `MaybeUninit<T>` on the stack) and to use `ptr::write` instead.

Fixes: rust-lang#136043
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 21, 2025
Rollup of 10 pull requests

Successful merges:

 - rust-lang#128080 (Specify scope in `out_of_scope_macro_calls` lint)
 - rust-lang#135630 (add more `s390x` target features)
 - rust-lang#136089 (Reduce `Box::default` stack copies in debug mode)
 - rust-lang#137192 (Remove obsolete Windows ThinLTO+TLS workaround)
 - rust-lang#137204 (Clarify MIR dialects and phases)
 - rust-lang#137299 (Simplify `Postorder` customization.)
 - rust-lang#137302 (Use a probe to avoid registering stray region obligations when re-checking drops in MIR typeck)
 - rust-lang#137305 (Tweaks in and around `rustc_middle`)
 - rust-lang#137313 (Some codegen_llvm cleanups)
 - rust-lang#137333 (Use `edition = "2024"` in the compiler (redux))

r? `@ghost`
`@rustbot` modify labels: rollup

try-job: aarch64-gnu
try-job: armhf-gnu
try-job: i686-mingw-1
try-job: i686-mingw-2
try-job: i686-mingw-3
try-job: test-various
try-job: x86_64-gnu-nopt
try-job: x86_64-msvc-1
try-job: x86_64-msvc-2
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 21, 2025
…kingjubilee

Rollup of 10 pull requests

Successful merges:

 - rust-lang#128080 (Specify scope in `out_of_scope_macro_calls` lint)
 - rust-lang#135630 (add more `s390x` target features)
 - rust-lang#136089 (Reduce `Box::default` stack copies in debug mode)
 - rust-lang#137192 (Remove obsolete Windows ThinLTO+TLS workaround)
 - rust-lang#137204 (Clarify MIR dialects and phases)
 - rust-lang#137299 (Simplify `Postorder` customization.)
 - rust-lang#137302 (Use a probe to avoid registering stray region obligations when re-checking drops in MIR typeck)
 - rust-lang#137305 (Tweaks in and around `rustc_middle`)
 - rust-lang#137313 (Some codegen_llvm cleanups)
 - rust-lang#137333 (Use `edition = "2024"` in the compiler (redux))

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 21, 2025
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#128080 (Specify scope in `out_of_scope_macro_calls` lint)
 - rust-lang#135630 (add more `s390x` target features)
 - rust-lang#136089 (Reduce `Box::default` stack copies in debug mode)
 - rust-lang#137204 (Clarify MIR dialects and phases)
 - rust-lang#137299 (Simplify `Postorder` customization.)
 - rust-lang#137302 (Use a probe to avoid registering stray region obligations when re-checking drops in MIR typeck)
 - rust-lang#137305 (Tweaks in and around `rustc_middle`)
 - rust-lang#137313 (Some codegen_llvm cleanups)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 8d52aae into rust-lang:master Feb 21, 2025
6 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Feb 21, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 21, 2025
Rollup merge of rust-lang#136089 - jwong101:box-default-debug-stack-usage, r=Amanieu

Reduce `Box::default` stack copies in debug mode

The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.

Another option would be to mark `Box::write` as `#[inline(always)]`,
and change it's implementation to to avoid calling `MaybeUninit::write`
(which creates a `MaybeUninit<T>` on the stack) and to use `ptr::write` instead.

Fixes: rust-lang#136043
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-box Area: Our favorite opsem complication S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash writing PNG from a C++ background thread on macOS Rust 1.84.0
6 participants