-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVPTX: Debug section symbol naming in .ptx output #99248
Comments
@rustbot label +O-NVPTX |
Not sure if relevant, but libnvvm broke if there were multiple DICompileUnits or something, i had to use rustc's patch thin lto function to get it to work https://github.com/Rust-GPU/Rust-CUDA/blob/master/crates/rustc_codegen_nvvm/src/nvvm.rs#L167 |
I came over this change and was wondering if this fixes the same problem as this issue is tracking. I have not yet had a chance to properly investigate. llvm/llvm-project@020fa86 |
I could reproduce this issue with nightly toolchain 2024-04-22 (fb89862 2024-04-21): However, I could not reproduce it with nightly toolchain 2025-01-17 (6067b36 2025-01-17). However, to me it seems unlikely that the llvm commit you mentioned is responsible for the fix. |
I'm pretty sure that @vetleras reproduced this just some weeks ago. Maybe he knows the right trick to provoke it? |
I discovered the issue on nightly-2024-12-14. However, I know we had this working back in October 2024 (on at least one nightly-2024-10-*), so I was surprised to read that you had reproduced it on nightly-2024-04-22. |
I think I know what happens and also why I could not reliably reproduce the issue with different compiler versions.
The Due to some changes in core (probably this) no vtable is emitted any longer using the example. However, the issue is still there. I was able to reproduce it with the following snippet on nightly 2025-01-17: #![feature(abi_ptx)]
#![no_std]
use core::any::Any;
#[no_mangle]
pub extern "ptx-kernel" fn foo() {
let a = return_dyn_object();
}
pub fn return_dyn_object() -> &'static (dyn Any + Send) {
&&4
}
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
} This produces something like |
This indeed seems like good news! Great that you were able to pinpoint the exact changes that made this bug "transient". It looks like LLVM 20 will be in nightly about 5 weeks from now. With that timeline it might not be worth back-porting the change. If you have more time to spend on this issue it would be settling to check out the LLVM 20 branch and verify once and for all that it fixes this problem. Let me know if you have the chance to confirm it. |
I just tested with the llvm-20 branch and can confirm the issue is no longer present. |
Unfortunately when compiling for older target archs (< sm_80) I still run into this issue. |
The problem occurs only if the emitted ptx version is < 7.0.
|
When the panic machinery is used in a
ptx-kernel
it generates a symbol in the debug section that is generated in a different way than the symbol it refers to.Today when compiling to ptx (
--target nvptx64-nvidia-cuda
) all debug info is stripped by ptx-linker here. This decision predates debug info support on nvptx in llvm and it seems like there are only a few obstacles left to be able to debug rust code with cuda-gdb. Nonetheless, getting debug symbols into .ptx requires installing a ptx-linker with thisLLVMStripModuleDebugInfo
commented out.It is reproducable with the following code compiled with
cargo +nightly-2022-06-18 rustc --target nvptx64-nvidia-cuda -Zbuild-std -- -C target-cpu=sm_86
Inside
.section .debug_info
i get a line looking like.b64 anon.e3d4032e2030354db324b88c41516303.47
. I assume it refers to the line.global .align 8 .u64 anon_$_e3d4032e2030354db324b88c41516303_$_47[4] = {_ZN4core3ptr88drop_in_place$LT$core$$panic$$panic_info$$PanicInfo$$internal_constructor$$NoPayload$GT$17h951018dc744e645eE, 0, 1, _ZN36_$LT$T$u20$as$u20$core$$any$$Any$GT$7type_id17h215c4e91a81cf303E};
If I manually change this line to
.b64 anon_$_e3d4032e2030354db324b88c41516303_$_47
ptxas stops complaining on this line (but still complains on a different nvptx debug info issue).If anyone got any pointers to where I should start investigate how this symbol mangling can be different from debug symbols and for
.global
variables I would be thankful.Have you had to deal with this @RDambrosio016?
The text was updated successfully, but these errors were encountered: