Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash writing PNG from a C++ background thread on macOS Rust 1.84.0 #136043

Closed
karip opened this issue Jan 25, 2025 · 14 comments · Fixed by #136089
Closed

Crash writing PNG from a C++ background thread on macOS Rust 1.84.0 #136043

karip opened this issue Jan 25, 2025 · 14 comments · Fixed by #136089
Labels
C-external-bug Category: issue that is caused by bugs in software beyond our control

Comments

@karip
Copy link

karip commented Jan 25, 2025

I'm writing PNG files from a C++ background thread. It used to work on Rust stable 1.83.0, but it crashes on Rust stable 1.84.0 and nightly. I'm using the png crate, but it or its dependencies shouldn't have any code, which would cause the crash.

The crash happens on macOS and Rust 1.84.0 (I tested on arm and intel macs). The program runs without issues on Ubuntu linux Rust 1.84.0. I didn't test Windows. There are no issues writing PNGs from the C++ main thread, but PNG writing crashes on the C++ background thread. If I write the same program in pure Rust without ffi and using Rust threads, then everything works without issues.

Could this be a regression in the Rust standard library? Or is the Rust function called from the C++ background thread in a way that is incompatible with Rust 1.84.0?

Code

Here's a minimal example project as a zip file so that you can see the code and try it yourself: test-cthread-png.zip

I expected to see this happen: Two PNG files should be written to the current folder: 'out-png-1.png' and 'out-png-2.png'

Instead, this happened: Only one PNG file is written: 'out-png-1.png' (on the main thread) and then the program crashes:

Rust PNG writing to 'out-png-1.png'
Waiting thread to finish..
Rust PNG writing to 'out-png-2.png'
zsh: bus error  ./main

Version it worked on

It most recently worked on: Rust 1.83

Version with regression

rustc --version --verbose:

rustc 1.84.0 (9fc6b4312 2025-01-07)
binary: rustc
commit-hash: 9fc6b43126469e3858e2fe86cafb4f0fd5068869
commit-date: 2025-01-07
host: aarch64-apple-darwin
release: 1.84.0
LLVM version: 19.1.5

LLVM Stack Trace

Stack trace of the example project

Process 62208 launched: '/Users/temp/rust/test-cthread-png/main' (arm64)
Rust PNG writing to 'out-png-1.png'
Waiting thread to finish..
Rust PNG writing to 'out-png-2.png'
Process 62208 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=2, address=0x16fe036b0)
    frame #0: 0x000000010005a334 main`alloc::boxed::Box$LT$core..mem..maybe_uninit..MaybeUninit$LT$T$GT$$C$A$GT$::write::hf0470a3673d75fd5(boxed=0x0000000000000000, value=<unavailable>) at boxed.rs:1008
   1005	    /// ```
   1006	    #[unstable(feature = "box_uninit_write", issue = "129397")]
   1007	    #[inline]
-> 1008	    pub fn write(mut boxed: Self, value: T) -> Box<T, A> {
   1009	        unsafe {
   1010	            (*boxed).write(value);
   1011	            boxed.assume_init()
Target 0: (main) stopped.

(lldb) thread backtrace
* thread #2, stop reason = EXC_BAD_ACCESS (code=2, address=0x16fe036b0)
  * frame #0: 0x000000010005a334 main`alloc::boxed::Box$LT$core..mem..maybe_uninit..MaybeUninit$LT$T$GT$$C$A$GT$::write::hf0470a3673d75fd5(boxed=0x0000000000000000, value=<unavailable>) at boxed.rs:1008
    frame #1: 0x000000010005a540 main`_$LT$alloc..boxed..Box$LT$T$GT$$u20$as$u20$core..default..Default$GT$::default::h770abf79bd18dfcf at boxed.rs:1733:9
    frame #2: 0x0000000100045818 main`miniz_oxide::deflate::core::DictOxide::new::hf9b523780f33e231(flags=4112) at core.rs:1156:16
    frame #3: 0x000000010004116c main`_$LT$miniz_oxide..deflate..core..CompressorOxide$u20$as$u20$core..default..Default$GT$::default::h98fdbfee01692241 at core.rs:436:19
    frame #4: 0x000000010003dea8 main`_$LT$alloc..boxed..Box$LT$T$GT$$u20$as$u20$core..default..Default$GT$::default::h89b44e0f0b402adf at boxed.rs:1733:39
    frame #5: 0x000000010003ecc8 main`_$LT$flate2..ffi..rust..Deflate$u20$as$u20$flate2..ffi..DeflateBackend$GT$::make::hee247c1da637a407(level=(__0 = 6), zlib_header=true, _window_bits='\x0f') at rust.rs:131:47
    frame #6: 0x000000010003fc8c main`flate2::mem::Compress::new::h1c4d10d7643a257f(level=(__0 = 6), zlib_header=true) at mem.rs:197:20
    frame #7: 0x000000010002d1dc main`flate2::zlib::write::ZlibEncoder$LT$W$GT$::new::h9b8a7495719a8ece(w=Vec<u8, alloc::alloc::Global> @ 0x000000016fe863a8, level=(__0 = 6)) at write.rs:43:40
    frame #8: 0x000000010000c398 main`png::encoder::Writer$LT$W$GT$::write_image_data::h1a6cf982c65de46c(self=0x000000016fe86b98, data=(data_ptr = "", length = 307200)) at encoder.rs:742:32
    frame #9: 0x000000010000b99c main`rspng_write(filenumber='\x02') at lib.rs:20:8
    frame #10: 0x00000001000022fc main`MyThread::run(this=0x000000016fdff32b) at main.cpp:9:13
    frame #11: 0x0000000100002e54 main`decltype(*std::declval<MyThread*>().*std::declval<void (MyThread::*)()>()()) std::__1::__invoke[abi:ne180100]<void (MyThread::*)(), MyThread*, void>(__f=0x0000600002b940c8, __a0=0x0000600002b940d8) at invoke.h:312:25
    frame #12: 0x0000000100002d84 main`void std::__1::__thread_execute[abi:ne180100]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (MyThread::*)(), MyThread*, 2ul>(__t=size=3, (null)=__tuple_indices<2UL> @ 0x000000016fe86f7f) at thread.h:199:3
    frame #13: 0x0000000100002680 main`void* std::__1::__thread_proxy[abi:ne180100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (MyThread::*)(), MyThread*>>(__vp=0x0000600002b940c0) at thread.h:208:3
    frame #14: 0x0000000188b272e4 libsystem_pthread.dylib`_pthread_start + 136
(lldb)

@karip karip added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jan 25, 2025
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 25, 2025
@theemathas
Copy link
Contributor

I can reproduce this on my computer. The crash is happening inside Box::default(), which calls Box::write(Box::new_uninit(), T::default()). For some reason, it appears that Box::write is getting a null pointer Box. Notably, this occurs when the rust code is built in debug mode.

@purplesyringa
Copy link
Contributor

If you look at the disassembly, you'll see that the failure is in a stack probe that tries to allocate 0x50000 bytes (about 320 KiB) on the stack. Apparently threads on macOS (perhaps only those created by the C++ runtime?) have less memory than the main threads, so the allocation fails and the program crashes.

Compiling under --release or with more stack should fix the problem, although this is arguably a miniz_oxide bug.

Usually you'd see an error like thread '{name}' has overflowed its stack in this case, but that only applies when Rust starts the program, not during FFI (because Rust cannot safely configure a SIGBUS handler in this case).

@theemathas
Copy link
Contributor

theemathas commented Jan 25, 2025

#131460 seems suspicious. (Although it presumably makes this use less stack space in release builds.)

@purplesyringa
Copy link
Contributor

purplesyringa commented Jan 25, 2025

It seems like that PR increased the number of locals of type T w/o optimizations, which is problematic with large objects. We might want to consider in-place Default (much like we currently have in-place Clone) elsewhere as a feature request, but I think this is currently better solved in miniz_oxide by initializing the data on the heap manually instead of relying on Default.

@rustbot label -I-prioritize -needs-triage -regression-untriaged -C-bug +C-external-bug

@rustbot rustbot added C-external-bug Category: issue that is caused by bugs in software beyond our control and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. regression-untriaged Untriaged performance or correctness regression. C-bug Category: This is a bug. labels Jan 25, 2025
@theemathas
Copy link
Contributor

cc @jwong101 (author of the PR I linked.)

@theemathas
Copy link
Contributor

theemathas commented Jan 25, 2025

Minimized (with some differences from the original code, in particular, I'm using a much larger allocation (the original miniz_oxide code created a 164098-byte struct on the heap)):

#[allow(dead_code)]
struct Thing([u8; 1000000]);
impl Default for Thing {
    fn default() -> Self {
        Thing([0; 1000000])
    }
}

fn main() {
    std::thread::Builder::new()
        .stack_size(2500000)
        .spawn(|| {
            let _x: Box<Thing> = Box::default();
        })
        .unwrap()
        .join()
        .unwrap();
}

On my computer (mac), in debug mode, this runs fine on version 1.83.0. It crashes with thread '<unknown>' has overflowed its stack in version 1.84.0.

This code also reproduces the issue on godbolt, which I believe is on linux.

It appears that version 1.84.0 puts three copies of the array onto the stack, for some reason.

@theemathas
Copy link
Contributor

The code in miniz_oxide first used Box::default in this manner at Frommi/miniz_oxide@33585a6, saying that it uses the "default trick" to avoid excessive stack copies. I don't know where this supposed trick is from.

@jwong101
Copy link
Contributor

It looks like most of the stack growth in debug mode is due to the fact that Box::new(x) is being directly inlined, so it just directly calls memcpy(after allocating). Whereas Box::write(uninit, x), has to create a stack frame(since this is debug mode), and copies x onto the stack again.

@jwong101
Copy link
Contributor

Manually inlining Box::write into Box::default seems to do the trick in reducing excessive stack usage in debug mode:

fn new_default<T: Default>() -> Box<T> {
    let mut x: Box<MaybeUninit<T>>= Box::new_uninit();
    unsafe {
        x.as_mut_ptr().cast::<T>().write(T::default());
        x.assume_init()
    }
}

@oyvindln
Copy link
Contributor

oyvindln commented Jan 26, 2025

The code in miniz_oxide first used Box::default in this manner at Frommi/miniz_oxide@33585a6, saying that it uses the "default trick" to avoid excessive stack copies. I don't know where this supposed trick is from.

It stems from back when rust didn't manage optimize Box::new() properly in release, while the implementation of Box::default() used the old box keyword internally (that never ended up getting stabilized and eventually got removed) so it managed to put the boxed object directly the extra stack usage and thus it was a trick to avoid the compiler reserving stack space the size of the object when creating a boxed object.

That hasn't been relevant for quite some time now though as the compiler is smart enough to optimized it properly in most cases now since and also as Box::default() just calls Box::new() now anyhow, so it's probably time to change it to just use Box::new() directly.

@theemathas
Copy link
Contributor

theemathas commented Jan 26, 2025

Box::default() just calls Box::new() now anyhow

This is not true, starting in 1.84.0 (changed in 131460). And I suspect that this change actually caused the regression.

@theemathas
Copy link
Contributor

theemathas commented Jan 26, 2025

The most foolproof fix in miniz_oxide would probably be to use something like zeroed_box from bytemuck to allocate the Box<HashBuffers>.

I think that, on the rust side though, having 3 copies of the value on the stack isn't exactly ideal.

@oyvindln
Copy link
Contributor

Box::default() just calls Box::new() now anyhow

This is not true, starting in 1.84.0 (changed in 131460). And I suspect that this change actually caused the regression.

Ah sorry, I' haven't fully kept up with all changes. At least using Box::new should make it act like before this change as I would rather avoid introducing a dependency like that if I can avoid it. Ideally there would be a safe way to directly heap allocate objects in standard safe rust by now that didn't result in excess stack usage in debug mode but I guess that's still a tricky issue to solve.

@jwong101
Copy link
Contributor

jwong101 commented Jan 26, 2025

We should be able to reduce the number of copies of T: Default in debug mode to just onetwo by directly inlining Box::write. This matches the original implementation of Box::default, while still preserving the release optimization.

However, there's still a couple of extra pointers on the stack in debug mode, but the overhead in terms of T: Default should be the same.

I'll make a PR later tonight once I get back home.

Edit:
I forgot about the alloca in the T::default function, since it wasn't inlined. So there's two alloca's of T in the updated Box::default implementation, however that still matches Box::new(T::default()).

workingjubilee added a commit to workingjubilee/rustc that referenced this issue Feb 20, 2025
…sage, r=Amanieu

Reduce `Box::default` stack copies in debug mode

The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.

Another option would be to mark `Box::write` as `#[inline(always)]`,
and change it's implementation to to avoid calling `MaybeUninit::write`
(which creates a `MaybeUninit<T>` on the stack) and to use `ptr::write` instead.

Fixes: rust-lang#136043
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Feb 20, 2025
…sage, r=Amanieu

Reduce `Box::default` stack copies in debug mode

The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.

Another option would be to mark `Box::write` as `#[inline(always)]`,
and change it's implementation to to avoid calling `MaybeUninit::write`
(which creates a `MaybeUninit<T>` on the stack) and to use `ptr::write` instead.

Fixes: rust-lang#136043
@bors bors closed this as completed in 8d52aae Feb 21, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Feb 21, 2025
Rollup merge of rust-lang#136089 - jwong101:box-default-debug-stack-usage, r=Amanieu

Reduce `Box::default` stack copies in debug mode

The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.

Another option would be to mark `Box::write` as `#[inline(always)]`,
and change it's implementation to to avoid calling `MaybeUninit::write`
(which creates a `MaybeUninit<T>` on the stack) and to use `ptr::write` instead.

Fixes: rust-lang#136043
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-external-bug Category: issue that is caused by bugs in software beyond our control
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants