-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile bug: new mma fattn uses constructs that can not be unrolled by llvm casueing a tonne of warnings #11602
Comments
What do you mean, it cannot be unrolled? The loop itself has a fixed length, the |
it could be, but the compiler is too dumb this construct with break is not supported. |
How sure are you that the cuda compiler can unroll this loop btw, since no compiler can reasonably unroll the for (int col0 = 0; col0 < ncols; col0 += block_size) when ncols is not constexpr as is the case in softmax, which apparently dident trigger a warning on cuda either. |
With the softmax kernel it makes sense that the compiler would issue a warning since for the case with ncols == 0 since in that case the size of the loop is actually unknown. I interpreted your comments in #11471 to mean that it is only the template specialization with ncols == 0 where warnings are issued. In the new stream-k fixup I don't understand the warning at all. If you look at the stream-k fixup in |
Yes that is correct. So the problem is, that in a previous pass llvm has folded the loop-with-break into a loop with adjusted bounds based on the location of the break. When in a later pass the loop is then supposed to be unrolled the bounds of the loop are no longer be constexpr when the condition for branch with the break is not constexpr. Yes this is dumb, but thats how llvm works at the moment. the case in softmax.cu works fine beacuse the condition of the branch with the break is known at compile time (ie never) when ncols_template is not zero. In mmq.cuh i cant find a location in the code where you break out of a loop in a branch whos evaluation is not known at compile time so is fine there too. |
Git commit
90f9b88
Operating systems
Linux
GGML backends
CUDA, HIP
Problem description & steps to reproduce
this construct https://github.com/ggerganov/llama.cpp/blob/90f9b88afb6447d3929843a2aa98c0f11074762d/ggml/src/ggml-cuda/fattn-common.cuh#L553 can not be unrolled by llvm for gpu targets (ie amdgcn) when ne01 is unkown at compile time, at the moment this causes several hundred warnings (one set for each arch) when compiling for rocm, please silence this like done for https://github.com/ggerganov/llama.cpp/blob/90f9b88afb6447d3929843a2aa98c0f11074762d/ggml/src/ggml-cuda/softmax.cu#L18
First Bad Commit
No response
Compile command
Relevant log output
The text was updated successfully, but these errors were encountered: