Compile bug: new mma fattn uses constructs that can not be unrolled by llvm casueing a tonne of warnings #11602

IMbackK · 2025-02-02T20:14:44Z

Git commit

90f9b88

Operating systems

Linux

GGML backends

CUDA, HIP

Problem description & steps to reproduce

this construct https://github.com/ggerganov/llama.cpp/blob/90f9b88afb6447d3929843a2aa98c0f11074762d/ggml/src/ggml-cuda/fattn-common.cuh#L553 can not be unrolled by llvm for gpu targets (ie amdgcn) when ne01 is unkown at compile time, at the moment this causes several hundred warnings (one set for each arch) when compiling for rocm, please silence this like done for https://github.com/ggerganov/llama.cpp/blob/90f9b88afb6447d3929843a2aa98c0f11074762d/ggml/src/ggml-cuda/softmax.cu#L18

First Bad Commit

No response

Compile command

any

Relevant log output

llama.cpp/ggml/src/ggml-cuda/template-instances/../fattn-common.cuh:523:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]

The text was updated successfully, but these errors were encountered:

JohannesGaessler · 2025-02-02T20:20:49Z

What do you mean, it cannot be unrolled? The loop itself has a fixed length, the break should just be a conditional GOTO to the end of the loop and I don't see why this condition couldn't be dynamic.

IMbackK · 2025-02-02T20:21:39Z

it could be, but the compiler is too dumb

this construct with break is not supported.

IMbackK · 2025-02-02T20:29:01Z

How sure are you that the cuda compiler can unroll this loop btw, since no compiler can reasonably unroll the for (int col0 = 0; col0 < ncols; col0 += block_size) when ncols is not constexpr as is the case in softmax, which apparently dident trigger a warning on cuda either.

JohannesGaessler · 2025-02-02T21:40:13Z

With the softmax kernel it makes sense that the compiler would issue a warning since for the case with ncols == 0 since in that case the size of the loop is actually unknown. I interpreted your comments in #11471 to mean that it is only the template specialization with ncols == 0 where warnings are issued.

In the new stream-k fixup I don't understand the warning at all. If you look at the stream-k fixup in mmq.cuh it is extremely similar and should also cause warning spam if those loops cannot actually be unrolled; in fact this is a very common pattern in the code that I've written. I tried compiling the new fixup kernel with and without #pragma unroll and got the same PTX code. But unfortunately that is an inconclusive result because the compiler can decide to unroll loops on its own. For now I think we should just suppress the warning, if the loop cannot be successfully unrolled that is at most a performance problem.

IMbackK · 2025-02-02T21:52:34Z

With the softmax kernel it makes sense that the compiler would issue a warning since for the case with ncols == 0 since in that case the size of the loop is actually unknown. I interpreted your comments in #11471 to mean that it is only the template specialization with ncols == 0 where warnings are issued.

Yes that is correct.

So the problem is, that in a previous pass llvm has folded the loop-with-break into a loop with adjusted bounds based on the location of the break. When in a later pass the loop is then supposed to be unrolled the bounds of the loop are no longer be constexpr when the condition for branch with the break is not constexpr. Yes this is dumb, but thats how llvm works at the moment.

the case in softmax.cu works fine beacuse the condition of the branch with the break is known at compile time (ie never) when ncols_template is not zero.

In mmq.cuh i cant find a location in the code where you break out of a loop in a branch whos evaluation is not known at compile time so is fine there too.

IMbackK added the bug-unconfirmed label Feb 2, 2025

IMbackK assigned JohannesGaessler Feb 2, 2025

JohannesGaessler mentioned this issue Feb 2, 2025

HIP: fix flash_attn_stream_k_fixup warning #11604

Merged

JohannesGaessler closed this as completed in #11604 Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile bug: new mma fattn uses constructs that can not be unrolled by llvm casueing a tonne of warnings #11602

Compile bug: new mma fattn uses constructs that can not be unrolled by llvm casueing a tonne of warnings #11602

IMbackK commented Feb 2, 2025

JohannesGaessler commented Feb 2, 2025 •

edited

Loading

IMbackK commented Feb 2, 2025 •

edited

Loading

IMbackK commented Feb 2, 2025

JohannesGaessler commented Feb 2, 2025

IMbackK commented Feb 2, 2025 •

edited

Loading

Compile bug: new mma fattn uses constructs that can not be unrolled by llvm casueing a tonne of warnings #11602

Compile bug: new mma fattn uses constructs that can not be unrolled by llvm casueing a tonne of warnings #11602

Comments

IMbackK commented Feb 2, 2025

Git commit

Operating systems

GGML backends

Problem description & steps to reproduce

First Bad Commit

Compile command

Relevant log output

JohannesGaessler commented Feb 2, 2025 • edited Loading

IMbackK commented Feb 2, 2025 • edited Loading

IMbackK commented Feb 2, 2025

JohannesGaessler commented Feb 2, 2025

IMbackK commented Feb 2, 2025 • edited Loading

JohannesGaessler commented Feb 2, 2025 •

edited

Loading

IMbackK commented Feb 2, 2025 •

edited

Loading

IMbackK commented Feb 2, 2025 •

edited

Loading