Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed integer overflow causes program to skip the epilogue and fall into another function #48943

Open
m13253 opened this issue Mar 16, 2021 · 28 comments · May be fixed by #109732
Open

Signed integer overflow causes program to skip the epilogue and fall into another function #48943

m13253 opened this issue Mar 16, 2021 · 28 comments · May be fixed by #109732
Labels
bugzilla Issues migrated from bugzilla c++ clang:codegen

Comments

@m13253
Copy link
Member

m13253 commented Mar 16, 2021

Bugzilla Link 49599
Version trunk
OS All
CC @chandlerc,@DMG862,@dwblaikie,@DougGregor,@emaste,@LebedevRI,@pogo59,@zygoloid,@oToToT

Extended Description

Comment:

Clang 13 simply does not generate any code for f1 after the undefined behavior point. So any call onto f1 will eventually ends up fell into f2.

Although the compiler can do anything with an undefined behavior, including simply crashing, infinite loop, playing some music, or nuke the earth without violating the C++ specification. I still hope this undefined behavior won't be that surprising.

This issue is not observed in C frontend, or Clang 12.

Godbolt link for your convenience: https://godbolt.org/z/r3nWrE

Source code:

#include <stdio.h>

void f1(void) {
    for(int i = 0; i >= 0; i++) {
        // Undefined behavior
    }
}

void f2(void) {
    puts("Formatting /dev/sda1...");
    // system("mkfs -t btrfs -f /dev/sda1");
}

// Prevents inlining
void (*volatile p1)(void) = f1;
void (*volatile p2)(void) = f2;

int main(void) {
    puts(__VERSION__);
    p1();
    return 0;
}

Output:

Clang 13.0.0 (https://github.com/llvm/llvm-project.git fcdf7f6224610a51dc2ff47f2f1e3377329b64a7)
Formatting /dev/sda1...
@LebedevRI
Copy link
Member

This is indeed UB, and -fsanitize=undefined will tell you about it:
https://godbolt.org/z/79q6s8
example.cpp:4:29: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.cpp:4:29 in

You probably want -fwrapv.

@dwblaikie
Copy link
Collaborator

FWIW, I have plans to at least put a ud2/trap at the end of functions that don't otherwise return or at least call (call might be non-returning, so wouldn't want to add an extra instruction if it won't ever be called - but does mean sometimes it could fallthrough to the following function).

But yeah, UB is UB, and this is within the realm of failures you can expect from LLVM's optimizations.

@m13253
Copy link
Member Author

m13253 commented Mar 16, 2021

FWIW, I have plans to at least put a ud2/trap at the end of functions that
don't otherwise return or at least call (call might be non-returning, so
wouldn't want to add an extra instruction if it won't ever be called - but
does mean sometimes it could fallthrough to the following function).

But yeah, UB is UB, and this is within the realm of failures you can expect
from LLVM's optimizations.

I would consider ud2 a reasonable solution.
It's free of performance cost, and it saves lives.

I understand the compiler can do anything with a UB.
But triggering a trap is always better than nuking the earth, isn't it?

By the way, GCC warns this UB by default (without -Wall or -Wextra) as long you don't give it -O0.
Is there any technical / performance reason to put this check into -fsanitize=undefined instead of making it default?

Anyhow, a ud2 is good enough.

@DMG862
Copy link
Mannequin

DMG862 mannequin commented Mar 16, 2021

-mllvm -trap-unreachable will add a trap.

Some targets (MachO, Webassembly, PS4) will do this by default.

@LebedevRI
Copy link
Member

-mllvm -trap-unreachable will add a trap.

Some targets (MachO, Webassembly, PS4) will do this by default.

Hm, that doesn't pessimize the IR at all: https://godbolt.org/z/q49rq6
Only the assembly: https://godbolt.org/z/sceK9T
Why isn't this the default then?

@pogo59
Copy link
Collaborator

pogo59 commented Mar 16, 2021

Why isn't this the default then?

It will increase code size a bit. If you have stack-protector turned on,
there is likely to be a call to the (noreturn) check-fail function at the
end of any function with a nontrivial amount of local storage, and any call
to a noreturn function would need a trap after it. For PS4, we encountered
it because the "return" address of the check-fail call could actually be
the entry point of the next function, and the traceback/unwinder code was
not subtracting 1 from the return address before symbolizing. It was easier
to make the compiler insert the trap than to fix the unwinder.

@m13253
Copy link
Member Author

m13253 commented Mar 16, 2021

According to Twitter user @​ScottWolchok, this behavior is not conforming to the standard.

(§5.10/1): "Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address"

However, both f1 and f2 have the same address, yet they are different functions.

@zygoloid
Copy link
Mannequin

zygoloid mannequin commented Mar 16, 2021

Yes, irrespective of anything else, giving f1 and f2 the same address is a miscompile. I think that's a somewhat different issue than the one originally filed, as that only covers the case where f1 ends up being empty.

@dwblaikie
Copy link
Collaborator

According to Twitter user @​ScottWolchok, this behavior is not conforming
to the standard.

(§5.10/1): "Two pointers of the same type compare equal if and only if they
are both null, both point to the same function, or both represent the same
address"

However, both f1 and f2 have the same address, yet they are different
functions.

Yes, that particular aspect of this is one of the reasons I was interested in the issue when I came across it due to a debug info problem: https://lists.llvm.org/pipermail/llvm-dev/2020-July/thread.html#143722

Though technically we can fix the conformance issue while still having this kind of UB in some fairly accessible cases - LLVM has an attribute "addrsig" to denote which functions have significant addresses - so, constructors for instance (which can't have their address taken/compared) could still have the "run into another function" failure mode.

(& I expect we still have this failure mode in non-empty functions too, FWIW)

My hope was to not go as far as trap-on-unreachable, which can be pretty clearly pessimizing (eg: "void f1() { unreachable; } void f2() { if (x) { f1(); } }" will generate code, test the condition, etc, when the whole condition/block should be optimized away instead) to just the narrow case where we could add one extra byte of instructions and stop a function from falling off the end into another function where there's a good chance of that. (exactly what a "good chance" is is debatable - if it's after a call, do we do it because the call might return? what about if the call is never going to return (it might be attributed with noreturn, or it might just not return for this call site/these particular parameter values))

@dwblaikie
Copy link
Collaborator

-mllvm -trap-unreachable will add a trap.

Some targets (MachO, Webassembly, PS4) will do this by default.

Hm, that doesn't pessimize the IR at all: https://godbolt.org/z/q49rq6
Only the assembly: https://godbolt.org/z/sceK9T
Why isn't this the default then?

As you say, it pessimizes the assembly - how much pessimizing is worth the extra safety is unclear. I think there's some room for a bit more safety & I'm working on that - but my guess is we might still leave some cases where you could fall through.

@pogo59
Copy link
Collaborator

pogo59 commented Mar 16, 2021

Two things:

First, when I try this case locally, f1() ends up being a retq instead of
optimized away entirely. Using clang trunk from last night on linux,
and the same options (-O1) as the godbolt link.
That would be recognizing the loop as empty, no uses of the iteration var,
so the entire thing can be optimized away. No UB detected, apparently.
Not sure what the difference is.

Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR
thing. For PS4 that's all we care about, but if it turns into a more
generic issue then we'll need some other mechanism.

@dwblaikie
Copy link
Collaborator

My hope was to not go as far as trap-on-unreachable, which can be pretty
clearly pessimizing (eg: "void f1() { unreachable; } void f2() { if (x) {
f1(); } }" will generate code, test the condition, etc, when the whole
condition/block should be optimized away instead)

Hmm, nope, that was wrong - seems trap-on-unreachable doesn't pessimize the middle end (unreachable blocks can still be removed by SimplifyCFG, etc) - but is only tested in codegen/targets.

Perhaps the MachO behavior (trap on unreachable, but don't trap after a noreturn call) might be worth generalizing. I'll see about measuring the impact of that to see what it'd be like to use it by default. (note that the MachO behavior still leaves the possibility of falling off the end if you violate the noreturn contract and return from such a function)

@dwblaikie
Copy link
Collaborator

Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR
thing. For PS4 that's all we care about, but if it turns into a more
generic issue then we'll need some other mechanism.

Oh, thanks for pointing that out - good to know/think about.

@dwblaikie
Copy link
Collaborator

Second, IIRC, "trap on unreachable" is an X86 target thing, and not an IR
thing. For PS4 that's all we care about, but if it turns into a more
generic issue then we'll need some other mechanism.

Oh, thanks for pointing that out - good to know/think about.

Since Apple uses it for MachO I was thinking "oh, what do they do on their other targets" - and it seems they maybe do enable it for ARM and AArch64 (& it's also enabled for WebAssembly). And maybe even the new old M68K backend...

$ grep -ir trap.*unreachable llvm/lib
llvm/lib/CodeGen/SelectionDAG/FastISel.cpp: if (TM.Options.TrapUnreachable)
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp: if (!DAG.getTarget().Options.TrapUnreachable)
llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp: if (DAG.getTarget().Options.TrapUnreachable)
llvm/lib/CodeGen/LLVMTargetMachine.cpp:static cl::opt EnableTrapUnreachable("trap-unreachable",
llvm/lib/CodeGen/LLVMTargetMachine.cpp: cl::desc("Enable generating trap for unreachable"));
llvm/lib/CodeGen/LLVMTargetMachine.cpp: if (EnableTrapUnreachable)
llvm/lib/CodeGen/LLVMTargetMachine.cpp: this->Options.TrapUnreachable = true;
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp: this->Options.TrapUnreachable = true;
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp: this->Options.TrapUnreachable = true;
llvm/lib/Target/ARM/ARMTargetMachine.cpp: this->Options.TrapUnreachable = true;
llvm/lib/Target/X86/X86TargetMachine.cpp: // the calling function, and TrapUnreachable is an easy way to get that.
llvm/lib/Target/X86/X86TargetMachine.cpp: // FIXME/Check here - 'trap unreachable' doesn't catch all cases of empty
llvm/lib/Target/X86/X86TargetMachine.cpp: this->Options.TrapUnreachable = true;
llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td:defm UNREACHABLE : NRI<(outs), (ins), [(trap)], "unreachable", 0x00>;
llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td:// terminator, lowering debugtrap to UNREACHABLE can create an invalid
llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td:defm DEBUG_UNREACHABLE : NRI<(outs), (ins), [(debugtrap)], "unreachable", 0x00>;
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp: // Trap lowers to wasm unreachable
llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp: this->Options.TrapUnreachable = true;
llvm/lib/Target/M68k/M68kISelLowering.cpp: if (CLI.DoesNotReturn && !getTargetMachine().Options.TrapUnreachable) {
llvm/lib/Transforms/Utils/Local.cpp: // Don't insert a call to llvm.trap right before the unreachable.
llvm/lib/Transforms/Utils/Local.cpp: // Don't insert a call to llvm.trap right before the unreachable.

@pogo59
Copy link
Collaborator

pogo59 commented Mar 16, 2021

Ah, I was mis-remembering. It's a TM flag and we arranged to set it for PS4.
Looks like each target (that cared) would have to be updated.

@pogo59
Copy link
Collaborator

pogo59 commented Mar 16, 2021

First, when I try this case locally, f1() ends up being a retq instead of
optimized away entirely.

Pilot error. Sorry for the noise. If I do it right, f1() is empty.

@pogo59
Copy link
Collaborator

pogo59 commented Mar 16, 2021

If I add a puts() to f1(), like so:

void f1(void) {
puts("in f1\n");
for (int i = 0; i >= 0; i++) {
// ub
}
}

then the codegen becomes very sad:

00000000004004d0 <_Z2f1v>:
4004d0: 50 push %rax
4004d1: bf 94 05 40 00 mov $0x400594,%edi
4004d6: e8 f5 fe ff ff callq 4003d0 puts@plt
4004db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

00000000004004e0 <_Z2f2v>:
4004e0: 50 push %rax
4004e1: bf 9b 05 40 00 mov $0x40059b,%edi
4004e6: e8 e5 fe ff ff callq 4003d0 puts@plt
4004eb: 58 pop %rax
4004ec: c3 retq
4004ed: 0f 1f 00 nopl (%rax)

Note f1() does a push with no pop, then falls into f2(), which means
the retq returns to.... whatever was in %rax.

This means UB is a potential safety/security problem, and we really should
do something about it.

@dwblaikie
Copy link
Collaborator

If I add a puts() to f1(), like so:

void f1(void) {
puts("in f1\n");
for (int i = 0; i >= 0; i++) {
// ub
}
}

then the codegen becomes very sad:

00000000004004d0 <_Z2f1v>:
4004d0: 50 push %rax
4004d1: bf 94 05 40 00 mov $0x400594,%edi
4004d6: e8 f5 fe ff ff callq 4003d0 puts@plt
4004db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

00000000004004e0 <_Z2f2v>:
4004e0: 50 push %rax
4004e1: bf 9b 05 40 00 mov $0x40059b,%edi
4004e6: e8 e5 fe ff ff callq 4003d0 puts@plt
4004eb: 58 pop %rax
4004ec: c3 retq
4004ed: 0f 1f 00 nopl (%rax)

Note f1() does a push with no pop, then falls into f2(), which means
the retq returns to.... whatever was in %rax.

Yeah, falling into another function on UB has been the way this works (except for the targets that opt out with TrapUnreachable) for a while now.

This means UB is a potential safety/security problem, and we really should
do something about it.

All sorts of UB manifests in lots of security issues, right? Buffer overruns and the like (I guess this is a buffer overrun, of sorts).

So I'm not sure that's extra/new motivation.

My take on it is with regards to

  • C++ defined behavior (distinct functions shouldn't compare equal)
  • DWARF (zero length functions+gc-sections (without newer tombstone work enabled) break DWARFv4 location and range lists)
  • general programmer convenience (there's not much value in producing truly zero length functions - the cost of one extra instruction in that case doesn't seem high and improves usability - and maybe that extends to non-zero-length functions too (though we can see not all such cases with Apple's disabling of trap after noreturn call - which, if you invoke UB by violating the noreturn contract, you'll get the "Fall off the end into another function" behavior again)

@pogo59
Copy link
Collaborator

pogo59 commented Mar 16, 2021

All sorts of UB manifests in lots of security issues, right? Buffer
overruns and the like (I guess this is a buffer overrun, of sorts).

I'm just thinking this particular one is handing someone a gadget.
A buffer overrun is a user-code error, really, that is typically not
something the compiler can detect (or not easily); this here is something
the compiler is doing on purpose to the execution path.

The zero-length function is just a special case of this more general problem.

But probably not worth debating the fine points here.

@m13253
Copy link
Member Author

m13253 commented Mar 19, 2021

Two opinions of mine:

(1) Adding ud2 is not that expensive.

According to Rust: rust-lang/rust#45920

They deliberately add an ud2 for all unreachable places to prevent "falling through" into whatever code next in memory, and found the increase in code size "ranged from 0.0% to 0.1%."

(2) Instead of adding ud2, can we emulate an infinite loop as Clang 12 does?

From the engineering aspect, I would consider reducing the affected radius of a UB as small as possible, to help debugging.

Emulating an infinite loop even though the compiler knows it is a UB might help debugging -- much better than confusing the debugger with an unbalanced stack and a flying %rip register.

Again, since nobody answered yet, why this warning is not turned on by default as GCC does?

@dwblaikie
Copy link
Collaborator

Two opinions of mine:

(1) Adding ud2 is not that expensive.

According to Rust: rust-lang/rust#45920

They deliberately add an ud2 for all unreachable places to prevent "falling
through" into whatever code next in memory, and found the increase in code
size "ranged from 0.0% to 0.1%."

Yes, that's roughly the plan I have in mind.

(2) Instead of adding ud2, can we emulate an infinite loop as Clang 12 does?

From the engineering aspect, I would consider reducing the affected radius
of a UB as small as possible, to help debugging.

It's not generally that simple - deciding where/how to "recover" from UB would be pretty difficult.

The Clang-advised way to deal with this would be to compile with -fsanitize=undefined

https://godbolt.org/z/3aW69c

example.cpp:4:29: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.cpp:4:29 in

(not sure if there's a slight bug in the error message there (the "in" being at the end) or just a quirk of godbolt or the like)

Emulating an infinite loop even though the compiler knows it is a UB might
help debugging -- much better than confusing the debugger with an unbalanced
stack and a flying %rip register.

Yeah - the ud2's more likely/easier to explain which situations it applies to and which ones it doesn't, etc.

Again, since nobody answered yet, why this warning is not turned on by
default as GCC does?

Because Clang doesn't have such a warning - in part because we try pretty hard not to have warnings that are powered by the optimizers (because doing so makes the warnings unreliable - they don't appear in some build modes, or might appear/disappear based on spooky-action-at-a-distance when other code changes and tips optimizations in one way or another).

Dynamic tools like the sanitizers provide more reliable & explanatory failures for this sort of situation.

@chandlerc
Copy link
Member

Note that LLD already pads functions with int3 specifically to block falling through into functions (or attackers jumping into the padding and falling through):

https://godbolt.org/z/Y46YKfqTc

Gold uses NOPs, which I think should be fixed.

BFD doesn't pad the functions at all.

LLVM also doesn't pad the functions when you remove -ffunction-sections.

I'd suggest LLVM at least do something similar to LLD and insist on at least one byte of padding between functions and emitting that as int3 if not using -ffunction-sections. Or maybe -ffunction-sections should just be more of the default.

@inclyc
Copy link
Member

inclyc commented Feb 10, 2023

Recently we have received a lot of cases where infinite loop/unreachable code is deleted and fallthrough to the next function. Our consensus is the behavior here very strange and surprising, see comment from @AaronBallman #60637 (comment). However this may be legal thanks to UB.

There was a proposal to insert a "ud2" here: #48943 (comment). And @dwblaikie mentioned his minds here: #60588 (comment).

@shafik mentioned there is a paper intended to address this problem #60622 (comment).

@FrankHB
Copy link

FrankHB commented Feb 10, 2023

I'm neutral to optimize such cases if indeed UB (anyway you have -mllvm -trap-unreachable to force the detailed code generation behavior less surprising) so I can accept a WONTFIX in this case. But mixing UB and other cases in the same path is intolerable, so I don't treat #60622 the same. For the latter we'd wait the paper. Note ubsan also has the same problem specific to this point.

@inclyc
Copy link
Member

inclyc commented Feb 10, 2023

Thanks @pbo-linaro! The commit introduced loop deletion was: 6c31295, mentioned in #60652

@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2023

@llvm/issue-subscribers-clang-codegen

@neldredge
Copy link

neldredge commented Feb 13, 2023

Just to reiterate a point raised in some of the linked bugs, one problem is that this can result in the UB function having the same address as another function, so that their function pointers compare equal, which is not allowed by the C/C++ standards. Such a program does not have UB as long as the offending function is never called, so the breakage of this bug isn't limited to UB programs.

Code like that might even exist. If you are processing function pointers and you need multiple sentinel values (so that NULL isn't sufficient), you could define a dummy function and use its address as a sentinel. And since the function will never be called, you could put any old garbage code inside it.

@shafik
Copy link
Collaborator

shafik commented Dec 18, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla c++ clang:codegen
Projects
None yet
Development

Successfully merging a pull request may close this issue.