Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVPTX: Debug section symbol naming in .ptx output #99248

Open
kjetilkjeka opened this issue Jul 14, 2022 · 11 comments
Open

NVPTX: Debug section symbol naming in .ptx output #99248

kjetilkjeka opened this issue Jul 14, 2022 · 11 comments
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kjetilkjeka
Copy link
Contributor

When the panic machinery is used in a ptx-kernel it generates a symbol in the debug section that is generated in a different way than the symbol it refers to.

Today when compiling to ptx (--target nvptx64-nvidia-cuda) all debug info is stripped by ptx-linker here. This decision predates debug info support on nvptx in llvm and it seems like there are only a few obstacles left to be able to debug rust code with cuda-gdb. Nonetheless, getting debug symbols into .ptx requires installing a ptx-linker with this LLVMStripModuleDebugInfo commented out.

It is reproducable with the following code compiled with cargo +nightly-2022-06-18 rustc --target nvptx64-nvidia-cuda -Zbuild-std -- -C target-cpu=sm_86

#![no_std]
#![feature(abi_ptx)]

#[no_mangle]
pub extern "ptx-kernel" fn foo() {
    panic!("bar");
}

#[panic_handler]
fn panic(panic_info: &core::panic::PanicInfo) -> ! {
    loop{
    }
}

Inside .section .debug_info i get a line looking like .b64 anon.e3d4032e2030354db324b88c41516303.47. I assume it refers to the line .global .align 8 .u64 anon_$_e3d4032e2030354db324b88c41516303_$_47[4] = {_ZN4core3ptr88drop_in_place$LT$core$$panic$$panic_info$$PanicInfo$$internal_constructor$$NoPayload$GT$17h951018dc744e645eE, 0, 1, _ZN36_$LT$T$u20$as$u20$core$$any$$Any$GT$7type_id17h215c4e91a81cf303E};

If I manually change this line to .b64 anon_$_e3d4032e2030354db324b88c41516303_$_47 ptxas stops complaining on this line (but still complains on a different nvptx debug info issue).

If anyone got any pointers to where I should start investigate how this symbol mangling can be different from debug symbols and for .global variables I would be thankful.

Have you had to deal with this @RDambrosio016?

@kjetilkjeka kjetilkjeka added the C-bug Category: This is a bug. label Jul 14, 2022
@kjetilkjeka
Copy link
Contributor Author

@rustbot label +O-NVPTX

@rustbot rustbot added the O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html label Jul 14, 2022
@RDambrosio016
Copy link
Contributor

Not sure if relevant, but libnvvm broke if there were multiple DICompileUnits or something, i had to use rustc's patch thin lto function to get it to work https://github.com/Rust-GPU/Rust-CUDA/blob/master/crates/rustc_codegen_nvvm/src/nvvm.rs#L167

@workingjubilee workingjubilee added the A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) label Mar 11, 2023
@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@kjetilkjeka
Copy link
Contributor Author

I came over this change and was wondering if this fixes the same problem as this issue is tracking. I have not yet had a chance to properly investigate. llvm/llvm-project@020fa86

@kulst
Copy link

kulst commented Feb 3, 2025

I could reproduce this issue with nightly toolchain 2024-04-22 (fb89862 2024-04-21):

However, I could not reproduce it with nightly toolchain 2025-01-17 (6067b36 2025-01-17). ptxas does not complain and manually looking at the .ptx does not show the problematic line.
It seems a fix in between might have resolved the issue.

However, to me it seems unlikely that the llvm commit you mentioned is responsible for the fix.
The corresponding commit 020fa86 is not part of the rust llvm submodule.
I will try to further track down where the issue starts to dissapear.

@kjetilkjeka
Copy link
Contributor Author

I'm pretty sure that @vetleras reproduced this just some weeks ago. Maybe he knows the right trick to provoke it?

@vetleras
Copy link

vetleras commented Feb 3, 2025

I discovered the issue on nightly-2024-12-14. However, I know we had this working back in October 2024 (on at least one nightly-2024-10-*), so I was surprised to read that you had reproduced it on nightly-2024-04-22.

@kulst
Copy link

kulst commented Feb 3, 2025

I think I know what happens and also why I could not reliably reproduce the issue with different compiler versions.
Good news first:

  • Using llc compiled from the current llvm main branch probably fixes the issue.
  • Instead of something like .b64 anon.697ade959c590fc607ef0ca540fa2259.2 .b64 anon_$_697ade959c590fc607ef0ca540fa2259_$_2 is emitted in debug_info.
  • It is very likely that llvm commit 020fa86 is responsible for the fix as suggested by @kjetilkjeka

The panic example reproduced the issue on earlier toolchain versions as there was a vtable emitted by it:
.global .align 8 .u64 anon_$_e3d4032e2030354db324b88c41516303_$_47[4] = {_ZN4core3ptr88drop_in_place$LT$core$$panic$$panic_info$$PanicInfo$$internal_constructor$$NoPayload$GT$17h951018dc744e645eE, 0, 1, _ZN36_$LT$T$u20$as$u20$core$$any$$Any$GT$7type_id17h215c4e91a81cf303E};

Due to some changes in core (probably this) no vtable is emitted any longer using the example. However, the issue is still there. I was able to reproduce it with the following snippet on nightly 2025-01-17:

#![feature(abi_ptx)]
#![no_std]

use core::any::Any;

#[no_mangle]
pub extern "ptx-kernel" fn foo() {
    let a = return_dyn_object();
}

pub fn return_dyn_object() -> &'static (dyn Any + Send) {
    &&4
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

This produces something like @anon.697ade959c590fc607ef0ca540fa2259.2 = ... in llvm-ir. Before the mentioned fix in llvm, such symbols are not properly mangled by llc in the debug info

@kjetilkjeka
Copy link
Contributor Author

This indeed seems like good news! Great that you were able to pinpoint the exact changes that made this bug "transient".

It looks like LLVM 20 will be in nightly about 5 weeks from now. With that timeline it might not be worth back-porting the change. If you have more time to spend on this issue it would be settling to check out the LLVM 20 branch and verify once and for all that it fixes this problem. Let me know if you have the chance to confirm it.

@kulst
Copy link

kulst commented Feb 4, 2025

I just tested with the llvm-20 branch and can confirm the issue is no longer present.

@kulst
Copy link

kulst commented Feb 4, 2025

Unfortunately when compiling for older target archs (< sm_80) I still run into this issue.

@kulst
Copy link

kulst commented Feb 5, 2025

The problem occurs only if the emitted ptx version is < 7.0.
The ptx version llc emits probably defaults to the minimum ptx version the target arch supports. It is specified here.
Turns out you can actually specify the ptx version that should be emitted by llc using target features. The following line emits ptx of version 8.5:

llc --mcpu sm_61 --mattr ptx85 kernels.optimized.o -o kernels.ptx

rustc already supports -Ctargetfeature. llvm-bitcode-linker just does not use this option. I will prepare a pull request for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants