-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVPTX: Cannot select: = NVPTXISD::LoadParam<(load (s24), align 4)> #55764
Comments
I guess we should've promoted it up to the next larger power-of-2 sized integer. |
This seems reasonable, it should work fine for the rustc issue! Let me know if there's anything I can do to help out. Edit: Removed a concern that was based on a misunderstanding |
Where did s24 come from? Is that coalesced |
Rustc turns aggregates smaller than the register size into an integer that can be placed in a single register when passing into or returning from a function. This optimization is only made when using the native rust abi, not the c or ptx-kernel abi. I discovered the bug when working on an image of struct Rgb<T> {
r: T,
g: T,
b: T,
} This bitcast from A temporary workaround is to use the C ( |
After looking a bit more into this today I have a couple of questions regarding the following (redacted) Rust code that is generated into the following (redacted) llvm-ir. Rust#[repr(C)]
#[derive(Clone, Copy)]
pub struct Foo {
a: u8,
b: u8,
c: u8,
}
#[inline(never)]
fn device(v: Foo) -> Foo {
v
} LLVM-IR%Foo = type { i8, i8, i8 }
; main_rs::device
; Function Attrs: noinline nounwind
define internal i24 @_ZN7main_rs6device17h534b9a1ae91055fbE(i24 %0) unnamed_addr #1 {
start:
%1 = alloca %Foo, align 1
%2 = alloca i24, align 4
%v = alloca %Foo, align 1
store i24 %0, i24* %2, align 4
%3 = bitcast %Foo* %v to i8*
%4 = bitcast i24* %2 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %3, i8* align 4 %4, i64 3, i1 false)
%5 = bitcast %Foo* %1 to i8*
%6 = bitcast %Foo* %v to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %5, i8* align 1 %6, i64 3, i1 false)
%7 = bitcast %Foo* %1 to i24*
%8 = load i24, i24* %7, align 1
ret i24 %8
} Copying between
|
It's not at the top of my todo list and I don't know when I'll get to it, so feel free to poke at it.
It's not, which is the reason for the crash you've reported. Performance-wise it would probably be better not to use i24 and, if you do need to pack into integers, pack into power-of-2 sized ones that can be lowered into efficient code. I'm not quite sure what drives this hand-packing of aggregates on IR level. Is that part of the Rust ABI? Normally it's target-dependent. E.g. on x64 small aggregates are passed in registers, power-of-2 sized, at that. Something like that may make sense for the rust-generated code for NVPTX. |
I had the chance to look a bit more into this today and due to being unfamiliar with the LLVM structure I hope to get my perceived understanding confirmed and maybe also some pointers in the right direction. It seems to me like the Is the solution to the problem to add the The implementation for |
Legalizer does sound like the right place to handle this. A lot of things in LLVM grow 'organically' so the new features/mechanisms are not always automatically propagated to all back-ends. In general, small reasonable changes (as in both understandable and sensible for the issue at hand) are the way to go. As long as they move LLVM in the right direction (incremental steps are fine), have tests, and don't break anyone upstreaming them should be pretty straightforward. |
I'm struggling a bit with making progress. It seems like in the Do you have a pointer in the right direction of where the corresponding |
I don't really know. If the standard lowering machinery can't be used with odd-sized integers, I'd try checking what happens if the same code gets compiled by a back-end that can deal with it. Track what they do differently and that may provide enough of a hint on how to do it in NVPTX. It may be easier to consider changing the IR generator so it does not use the types NVPTX can't handle. |
Simpler reproducer:
LegalizeInfo is a GlobalISel concept that should not be relevant for NVPTX. From a cursory look, this seems to be a simple bug in LowerReturn:
I think this should check getTypeSizeInBits() rather than getAllocSizeInBits(). For Then an |
Thanks for looking into this @nikic I don't think it is entirely that simple. rustc promotes everything with size less than a pointer size to immediates. This means that the same must also be done for I just found about the GlobalISel and that it is not used for NVPTX and that it is instead the I believe I can learn something from looking at The status is more or less that I have managed to get the simple examples to compile correctly. But I am breaking tests and are trying to learn what I can from these. I'm slowly learning about how LLVM and NVPTX lowering works and I'm very happy for ideas and pointers like the one above. |
I made some progress today and believe I'm almost done! The type promotion seems to work and can be viewed here I've not yet ran the formating tools and friends (and will of course do that before submitting a patch) but if there are any problems to the general structure please let me know so I can fix them. The final problem is that
|
I submitted a patch on phabricator today. https://reviews.llvm.org/D129291 |
…hen passing Today llc will crash when attempting to use non-power-of-two integer types as function arguments or returns. This patch enables passing non standard integer values in functions by promoting them before store and truncating after load. The main motivation of implementing this change is that rust casts small structs (less than pointer size) into an integer of the same size. As an example, if a struct contains three u8 then it will be passed as an i24. This patch is a step towards enabling rust compilation to ptx while retaining the target independent optimizations. More context can be found in #55764 Differential Revision: https://reviews.llvm.org/D129291
Closing as the fix have been merged |
…hen passing Today llc will crash when attempting to use non-power-of-two integer types as function arguments or returns. This patch enables passing non standard integer values in functions by promoting them before store and truncating after load. The main motivation of implementing this change is that rust casts small structs (less than pointer size) into an integer of the same size. As an example, if a struct contains three u8 then it will be passed as an i24. This patch is a step towards enabling rust compilation to ptx while retaining the target independent optimizations. More context can be found in llvm/llvm-project#55764 Differential Revision: https://reviews.llvm.org/D129291
…hen passing Today llc will crash when attempting to use non-power-of-two integer types as function arguments or returns. This patch enables passing non standard integer values in functions by promoting them before store and truncating after load. The main motivation of implementing this change is that rust casts small structs (less than pointer size) into an integer of the same size. As an example, if a struct contains three u8 then it will be passed as an i24. This patch is a step towards enabling rust compilation to ptx while retaining the target independent optimizations. More context can be found in llvm/llvm-project#55764 Differential Revision: https://reviews.llvm.org/D129291
I'm currently working on improving nvptx support in rust. The rust compiler produces llvm-ir with struct return values bitcast into integer values. This is fine for most targets, but for the triple
nvptx64-nvidia-cuda
llc
cannot select for these types. A description going deeper into the rust side of things can be found here I will try to keep the rest of this issue focused only on the llvm side.The problem from the LLVM side of things is that well formed llvm-ir cannot be compiled into ptx. I'm not sure if it is coincidental that clang always uses the actual struct as the return type, or if it is because it is better to work around this bug than to fix it. I do think that it will be a ugly hack in rustc to special case this for the nvptx target, and is thus willing to put a little amount of work into helping out fixing this bug if it is a realistic option.
I do realize this is only tested on a rather old version of llvm. I looked at your issue tracker and do not believe it is fixed. I apologize if this is a duplicate or have already been fixed.
Simplified example code
I believe this is a LLVM backend bug, and therefore the most interesting source is perhaps a simplified .ll example. The following is generated with bugpoint:
Backtrace
When attempting to compile this with
llc-14
from the Ubuntu 22.04 apt repo, the following is returned:Meta
My exact version of
llc-14
is:The text was updated successfully, but these errors were encountered: