[naga msl-out] Unbounded loop workaround is not adequate #6528

jimblandy · 2024-11-12T17:42:52Z

I'll provide more details later, but the LOOP_IS_REACHABLE kludge has been found to be insufficient at avoiding UB and security problems. Basically, even with the kludge applied, the loop is still infinite, and still reached, so it's just a matter of being clever.

msl
hlsl

The text was updated successfully, but these errors were encountered:

jimblandy · 2024-11-13T17:01:59Z

Tint's latest iteration is here or here.

Suppose the user presents you with a loop like this, where i is a u32:

loop {
    if (i < 10) {
        a[i] = b[i];
        break;
    }
}

The Metal Shading Language compiler is allowed to assume that all loops terminate. (This is just another way of saying, "infinite loops are UB".)

For that to be the case, we must reach the break, so we know that i < 10 when we reach the if. Thus, the entire loop can be rewritten as:

a[i] = b[i];

Since i is a u32, the compiler infers that 0 <= i < 10 when the assignment is reached. This means that if a and b are arrays of length 10 or greater, the compiler knows that i is in bounds, and thus can discard bounds checks injected by Naga.

Again, the reason the existing LOOP_IS_REACHABLE kludge doesn't suffice is that it merely persuades MSL that, since there is a branch over the infinite loop, conditions sufficient to lead to that branch aren't necessarily true. But the infinite loop is still there, so MSL is still free to infer other things based on that.

jimblandy · 2024-11-13T17:09:28Z

The fix is to generate code like this:

while (true) {
    // body of loop
    {volatile bool VOLATILE_NAME = false; if (VOLATILE_NAME) break;}
}

This forces the MSL compiler to assume that the loop does terminate, thus preventing it from assuming that other conditions leading to other breaks must be true.

This will probably have even more of a performance impact than our current LOOP_IS_REACHABLE kludge, since it's inside the loop, not outside it.

teoxoy · 2024-11-13T18:01:21Z

This is interesting, so the fact that "The Metal Shading Language compiler is allowed to assume that all loops terminate." + the fact that "indexing out of bounds is UB" (presumably allowing the compiler to also assume it doesn't happen) allows for this sort of optimizations where the bounds checks we inject are thrown away.

I finally understand the situation. I didn't think that UB could mean "assume it never happens" but I guess that's valid. This is unfortunate as we have to cope with this sort of ramifications.

teoxoy · 2024-11-13T18:09:25Z

Thinking about this some more, I always thought of UB as strictly happening at runtime (ex: if you index out of bounds at runtime, here be dragons). If that were the case, the optimization happening here would have been a bug.

That's clearly not the case https://en.cppreference.com/w/cpp/language/ub.
I'm not sure how C++ developers deal with this.

ErichDonGubler · 2024-11-13T22:06:51Z

@teoxoy: It is with a certain dubious delight that I present this old classic for your reading: Undefined behavior can result in time travel

teoxoy · 2024-11-14T11:48:20Z

I wanted to see the effect of this for myself: https://shader-playground.timjones.io/b13aed3846180491656e076460d45bce

It looks like the bound check is still in the output of the Metal compiler, the loop is gone and the if became:

  %5 = icmp ult i32 %1, 10
  call void @llvm.assume(i1 %5)

@llvm.assume is documented as:

Allows the optimizer to assume that the provided condition is true. This information can then be used in simplifying other parts of the code.

And I'm guessing that the optimizer will see that i is at most 9 and will replace the bound check (metal::min(i, 15u - 1)) with i.

Adding

{volatile bool VOLATILE_NAME = false; if (VOLATILE_NAME) break;}

to the end of the loop or using a volatile bool as the loop condition fixes the issue; the current LOOP_IS_REACHABLE macro doesn't (the @llvm.assume is still in the output).

teoxoy · 2024-11-14T12:08:50Z

I was curious to see what DXC does: https://shader-playground.timjones.io/9eb4af3bb2082c18cbd216659849eea1

It seems like it removes the loop and the if altogether but makes no assumptions about the value of i. This will preserve the bounds checks but could be viewed as a bug from the perspective of a WGSL developer since code will get executed regardless of the value of i. The WGSL spec doesn't disallow infinite loops unless they are trivial.

It seems to me that all of these headaches would be solved if we'd add a generous upper bound (u32::MAX?u64::MAX?) on the number of loop iterations in the WGSL spec. It could be that by the time code reaches one of these upper bounds, the driver would lose the device anyway.

teoxoy · 2024-11-14T12:14:38Z

@magcius also mentioned in one of the chats that SPIR-V considers infinite loops UB but I couldn't find info on that. Jasper, do you have some links/references?

In MSL output, avoid undefined behavior due to unbounded loops by adding an unpredictable, never-actually-taken `break` to the bottom of each loop body, rather than adding an unpredictable, never-actually-taken branch over each loop. This will probably have more of a performance impact, because it affects each iteration of the loop, but unlike branching over the loop, which leaves infinite loops (and thus undefined behavior) in the output, this actually ensures that no loop presented to Metal is unbounded, so that there is no undefined behavior present that the optimizer could use to make unwelcome inferences. Fixes gfx-rs#6528.

jimblandy · 2024-11-14T19:47:00Z

It seems to me that all of these headaches would be solved if we'd add a generous upper bound (u32::MAX?u64::MAX?) on the number of loop iterations in the WGSL spec. It could be that by the time code reaches one of these upper bounds, the driver would lose the device anyway.

OMG THIS IS BRILLIANT

This may well be the Right Fix for this problem. My theory is that the biggest perf impact is not actually the dumb never-taken branch at runtime, but the fact that the volatile makes it impossible for the compiler to tell how many times the loop will execute, so it can't unroll loops, or things like that.

But notice that, before the compiler unrolls a loop, it must first know how many iterations it will take. That implies that if we transform every loop into:

var limit = 0xffffffffu;
while (limit-- > 0) {
    // loop body
}

then the compiler may well still be able to detect that the loop will only run (say) four times anyway, and thus it can still unroll it. In any case where the optimizer can determine that the loop will exit for other reasons, it will boil away limit entirely. For example, I would absolutely expect LLVM to be able to turn this:

var i = 0;
var limit = 0xffffffffu;
while (limit -- > 0) {
    if (i >= 4) {
        break;
    }
    f(a[i]);
}

into

f(a[0]);
f(a[1]);
f(a[2]);
f(a[3]);

and I bet these simple cases are the kind of thing that are actually hurting performance. With the volatile kludge, the compiler can't do a darned thing.

jimblandy · 2024-11-14T20:04:01Z

The spec doesn't give us permission to put bounds on the loops like that - but any driver is going to kill us before we execute for too long anyway. And we can put a ridiculous limit like 0xffff_ffff_ffff_ffff on there, that's fine: the plan is we're never going to actually reach it anyway. It simply ensures that we're not writing code with UB. WGSL doesn't have u64, but MSL does have uint64_t.

jimblandy · 2024-11-14T20:13:51Z

@magcius also mentioned in one of the chats that SPIR-V considers infinite loops UB but I couldn't find info on that. Jasper, do you have some links/references?

I don't remember @magcius saying this, I thought it was @DemiMarie. She tried to chase down the sources but we didn't get a reply.

jimblandy · 2024-11-14T20:14:31Z

Note that Tint doesn't have any similar workarounds in their SPIR-V backend. I would expect them to have mentioned this if it were needed. And I can't find anything in the SPIR-V or Vulkan specs that suggests that infinite loops are UB.

jimblandy · 2024-11-14T21:04:48Z

According to godbolt.org, GCC compiles this C++:

void f(int *a) {
    int limit = 0xffff;
    for (int i = 0; i < 4; i++) {
        a[i]++;
        if (--limit == 0)
            break;
    }
}

to this x86_64 assembly:

        movdqu  xmm1, XMMWORD PTR [rdi]
        mov     eax, 1
        movd    xmm0, eax
        pshufd  xmm0, xmm0, 0
        paddd   xmm0, xmm1
        movups  XMMWORD PTR [rdi], xmm0
        ret

It's unrolling the entire four-iteration loop into a non-looped series of SIMD instructions. Clang does something similar.

DemiMarie · 2024-11-14T23:54:22Z

@magcius also mentioned in one of the chats that SPIR-V considers infinite loops UB but I couldn't find info on that. Jasper, do you have some links/references?

I don't remember @magcius saying this, I thought it was @DemiMarie. She tried to chase down the sources but we didn't get a reply.

It was a discussion in Mesa, but I think the conclusion was that this is allowed in SPIR-V and so such optimizations would be Mesa bugs.

cwfitzgerald · 2024-11-16T00:43:44Z

I think we should either do u32::MAX, or we should do two counters of 32bit, as I'm worried about the performance implications of 64bit arithmetic, especially on gpus where it's 1/64th rate

cwfitzgerald · 2024-11-16T00:45:20Z

Also the limit check needs to be at the very beginning of the loop, before any user code, or else a while (true) { continue; } would still be infinite

DemiMarie · 2024-11-16T02:28:08Z

I think we should either do u32::MAX, or we should do two counters of 32bit, as I'm worried about the performance implications of 64bit arithmetic, especially on gpus where it's 1/64th rate

Which GPUs are these?

jimblandy · 2024-11-18T03:42:28Z

It's easy enough to chain them together:

var lower: u32 = 0xffff_ffff;
var higher: u32 = 0xffff_ffff;
...
higher -= select(1, 0, lower == 0);
lower -= 1;
if ((lower | higher) == 0) {
    break;
}

But are there really ever loops that run 2^32 iterations in a shader? I'm skeptical.

jimblandy · 2024-11-18T16:12:50Z

It seems like it removes the loop and the if altogether but makes no assumptions about the value of i. This will preserve the bounds checks but could be viewed as a bug from the perspective of a WGSL developer since code will get executed regardless of the value of i. The WGSL spec doesn't disallow infinite loops unless they are trivial.

This transformation does make me uncomfortable as well. I'm unhappy about these dumb checks creeping in everywhere, but given that the compiler is already doing surprising things, I just don't think we can anticipate what the consequences would be for other, more creatively constructed shaders.

So I think we may need to do this dumb loop bounds injection kludge for HLSL as well.

DemiMarie · 2024-11-18T18:04:01Z

Would it be reasonable to add this to SPIR-V?

jimblandy · 2024-11-18T22:07:53Z

I've been looking at conversation like this, and it does not leave me feeling confident that we can assume unbounded loops will be handled correctly in SPIR-V. I'd like to hear from someone more authoritative, and ideally learn where the spec actually weighs in on this. But we should be prepared.

teoxoy · 2024-11-20T08:55:04Z

I found the internal Khronos issue (https://gitlab.khronos.org/cross-api/memory-model/-/issues/145), something that came out of it is https://github.com/KhronosGroup/Vulkan-Docs/blame/a61256a946d2b48b9efdd3a18c3fd4c4c354b949/chapters/commonvalidity/draw_dispatch_common.adoc#L519-L520.

So it's not spelled out that infinite loops are UB but it can be inferred since all invocations must terminate.

teoxoy · 2024-11-20T08:59:54Z

Other than putting an upper limit on the number of loop iterations I don't know how else we would be able to avoid the issue for SPIR-V.

DemiMarie · 2024-11-20T10:11:54Z

Other than putting an upper limit on the number of loop iterations I don't know how else we would be able to avoid the issue for SPIR-V.

Could Naga use this to support preempting a shader? There are algorithms to transform a program in such a way that it is guaranteed to reach a safepoint in a bounded amount of time. Those are usually used by language runtimes to support garbage collection or lightweight thread preemption, but they could also be used to preempt a shader before the driver gives up and resets the GPU. Resetting the GPU can often have unwanted knock-on effects, such as crashing the host compositor, so it is definitely something to avoid if possible.

For native API users, this could also allow supporting long-running compute. Instead of terminating the shader, the code would save all shader state to a buffer, from which it could be restored later. This would allow long running compute jobs on hardware and drivers that do not natively support them.

teoxoy · 2024-11-20T10:52:33Z

I guess we could do that but it sounds non-trivial, it would be worth opening a new issue for it.

jimblandy added backend: metal Issues with Metal area: naga back-end Outputs of naga shader conversion naga Shader Translator labels Nov 12, 2024

jimblandy added this to WebGPU for Firefox Nov 12, 2024

github-project-automation bot moved this to Todo in WebGPU for Firefox Nov 12, 2024

jimblandy mentioned this issue Nov 14, 2024

[naga msl-out] Avoid UB by making all loops bounded. #6545

Merged

jimblandy mentioned this issue Nov 14, 2024

[naga msl-out] Use iteration limits to avoid unbounded loops #6546

Open

jimblandy closed this as completed in #6545 Nov 18, 2024

jimblandy closed this as completed in 0b82776 Nov 18, 2024

github-project-automation bot moved this from Todo to Done in WebGPU for Firefox Nov 18, 2024

teoxoy mentioned this issue Nov 20, 2024

[naga] Prove to downstream shader compilers that loops terminate #6572

Open

4 tasks

jimblandy mentioned this issue Nov 26, 2024

Shader preemption #6581

Closed

Imberflur mentioned this issue Jan 21, 2025

[naga msl-out hlsl-out] Improve workaround for infinite loops causing undefined behaviour #6929

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[naga msl-out] Unbounded loop workaround is not adequate #6528

[naga msl-out] Unbounded loop workaround is not adequate #6528

jimblandy commented Nov 12, 2024 •

edited

Loading

jimblandy commented Nov 13, 2024

jimblandy commented Nov 13, 2024 •

edited

Loading

teoxoy commented Nov 13, 2024

teoxoy commented Nov 13, 2024 •

edited

Loading

ErichDonGubler commented Nov 13, 2024

teoxoy commented Nov 14, 2024 •

edited

Loading

teoxoy commented Nov 14, 2024

teoxoy commented Nov 14, 2024

jimblandy commented Nov 14, 2024 •

edited

Loading

jimblandy commented Nov 14, 2024

jimblandy commented Nov 14, 2024

jimblandy commented Nov 14, 2024

jimblandy commented Nov 14, 2024 •

edited

Loading

DemiMarie commented Nov 14, 2024

cwfitzgerald commented Nov 16, 2024

cwfitzgerald commented Nov 16, 2024

DemiMarie commented Nov 16, 2024

jimblandy commented Nov 18, 2024

jimblandy commented Nov 18, 2024

DemiMarie commented Nov 18, 2024

jimblandy commented Nov 18, 2024

teoxoy commented Nov 20, 2024

teoxoy commented Nov 20, 2024

DemiMarie commented Nov 20, 2024

teoxoy commented Nov 20, 2024

[naga msl-out] Unbounded loop workaround is not adequate #6528

[naga msl-out] Unbounded loop workaround is not adequate #6528

Comments

jimblandy commented Nov 12, 2024 • edited Loading

jimblandy commented Nov 13, 2024

jimblandy commented Nov 13, 2024 • edited Loading

teoxoy commented Nov 13, 2024

teoxoy commented Nov 13, 2024 • edited Loading

ErichDonGubler commented Nov 13, 2024

teoxoy commented Nov 14, 2024 • edited Loading

teoxoy commented Nov 14, 2024

teoxoy commented Nov 14, 2024

jimblandy commented Nov 14, 2024 • edited Loading

jimblandy commented Nov 14, 2024

jimblandy commented Nov 14, 2024

jimblandy commented Nov 14, 2024

jimblandy commented Nov 14, 2024 • edited Loading

DemiMarie commented Nov 14, 2024

cwfitzgerald commented Nov 16, 2024

cwfitzgerald commented Nov 16, 2024

DemiMarie commented Nov 16, 2024

jimblandy commented Nov 18, 2024

jimblandy commented Nov 18, 2024

DemiMarie commented Nov 18, 2024

jimblandy commented Nov 18, 2024

teoxoy commented Nov 20, 2024

teoxoy commented Nov 20, 2024

DemiMarie commented Nov 20, 2024

teoxoy commented Nov 20, 2024

jimblandy commented Nov 12, 2024 •

edited

Loading

jimblandy commented Nov 13, 2024 •

edited

Loading

teoxoy commented Nov 13, 2024 •

edited

Loading

teoxoy commented Nov 14, 2024 •

edited

Loading

jimblandy commented Nov 14, 2024 •

edited

Loading

jimblandy commented Nov 14, 2024 •

edited

Loading