-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Enhance kCFI type IDs with a 3-bit arity indicator. #117121
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-clang Author: Scott Constable (scottconstable) ChangesKernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits. Like any hashing scheme, hash collisions are possible. For example, a commodity Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 0.01383765 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf for the formula). This number can balloon with the addition of drivers and kernel modules. This patch reduces both the expected number of collisions and the potential impact of a collision by augmenting the hash with an arity value that indicates how many parameters the function has at the ABI level. Specifically, the patch further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit arity indicator as follows:
This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target. Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.
One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer. Full diff: https://github.com/llvm/llvm-project/pull/117121.diff 3 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..7cc6f120ec39a9 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
}
llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
- if (auto *FnType = T->getAs<FunctionProtoType>())
+ auto *FnType = T->getAs<FunctionProtoType>();
+ if (FnType)
T = getContext().getFunctionType(
FnType->getReturnType(), FnType->getParamTypes(),
FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
Out << ".normalized";
- return llvm::ConstantInt::get(Int32Ty,
- static_cast<uint32_t>(llvm::xxHash64(OutName)));
+ uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(OutName));
+ const auto &Triple = getTarget().getTriple();
+ if (Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+ // Estimate the function's arity (i.e., the number of arguments) at the ABI
+ // level by counting the number of parameters that are likely to be passed
+ // as registers, such as pointers and 64-bit (or smaller) integers. The
+ // Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+ // Additional parameters or parameters larger than 64 bits may be passed on
+ // the stack, in which case the arity is denoted as 7.
+ bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+ for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+ ++i) {
+ const Type *PT = FnType->getParamType(i).getTypePtr();
+ if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+ getContext().getTypeSize(PT) <= 64)))
+ MayHaveStackArgs = true;
+ }
+
+ // The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+ // to form an enhanced KCFI type ID. This can prevent, for example, a
+ // 3-arity function's ID from ever colliding with a 2-arity function's ID.
+ OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+ }
+
+ return llvm::ConstantInt::get(Int32Ty, OutHash);
}
void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
void foo(void (*fn)(int), int arg) {
// CHECK-LABEL: define{{.*}}foo
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+ // KCFI ID = 0x2A548E59
+ // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
fn(arg);
}
void bar(void (*fn)(int, int), int arg1, int arg2) {
// CHECK-LABEL: define{{.*}}bar
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ]
+ // KCFI ID = 0xD5A52C2A
+ // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ]
fn(arg1, arg2);
}
void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
// CHECK-LABEL: define{{.*}}baz
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ]
+ // KCFI ID = 0x2EA2BF3B
+ // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ]
fn(arg1, arg2, arg3);
}
// CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/kcfi.c
index 622843cedba50f..dc9e818a9f8cca 100644
--- a/clang/test/CodeGen/kcfi.c
+++ b/clang/test/CodeGen/kcfi.c
@@ -7,7 +7,6 @@
/// Must emit __kcfi_typeid symbols for address-taken function declarations
// CHECK: module asm ".weak __kcfi_typeid_[[F4:[a-zA-Z0-9_]+]]"
-// CHECK: module asm ".set __kcfi_typeid_[[F4]], [[#%d,HASH:]]"
/// Must not __kcfi_typeid symbols for non-address-taken declarations
// CHECK-NOT: module asm ".weak __kcfi_typeid_{{f6|_Z2f6v}}"
@@ -29,7 +28,7 @@ int __call(fn_t f) __attribute__((__no_sanitize__("kcfi"))) {
// CHECK: define dso_local{{.*}} i32 @{{call|_Z4callPFivE}}(ptr{{.*}} %f){{.*}}
int call(fn_t f) {
- // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#HASH]]) ]
+ // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#%d,HASH:]]) ]
return f();
}
@@ -48,6 +47,20 @@ static int f5(void) { return 2; }
// CHECK-DAG: declare !kcfi_type ![[#TYPE]]{{.*}} i32 @{{f6|_Z2f6v}}()
extern int f6(void);
+typedef struct {
+ int *p1;
+ int *p2[16];
+} s_t;
+
+// CHECK: define internal{{.*}} i32 @{{f7|_ZL2f7PFi3s_tEPS_}}(ptr{{.*}} %f, ptr{{.*}} %s){{.*}}
+static int f7(int (*f)(s_t), s_t *s) {
+ // CHECK: call{{.*}} i32 %{{.*}} [ "kcfi"(i32 [[#%d,HASH4:]]) ]
+ return f(*s) + 1;
+}
+
+// CHECK: define internal{{.*}} i32 @{{f8|_ZL2f83s_t}}(ptr{{.*}} %s){{.*}} !kcfi_type ![[#%d,TYPE4:]]
+static int f8(s_t s) { return 0; }
+
#ifndef __cplusplus
// C: define internal ptr @resolver1() #[[#]] !kcfi_type ![[#]] {
int ifunc1(int) __attribute__((ifunc("resolver1")));
@@ -59,12 +72,14 @@ long ifunc2(long) __attribute__((ifunc("resolver2")));
#endif
int test(void) {
+ s_t s;
return call(f1) +
__call((fn_t)f2) +
call(f3) +
call(f4) +
f5() +
- f6();
+ f6() +
+ f7(f8, &s);
}
#ifdef __cplusplus
@@ -85,3 +100,4 @@ void test_member_call(void) {
// CHECK-DAG: ![[#TYPE]] = !{i32 [[#HASH]]}
// CHECK-DAG: ![[#TYPE2]] = !{i32 [[#%d,HASH2:]]}
// MEMBER-DAG: ![[#TYPE3]] = !{i32 [[#HASH3]]}
+// CHECK-DAG: ![[#TYPE4]] = !{i32 [[#HASH4]]}
|
@llvm/pr-subscribers-clang-codegen Author: Scott Constable (scottconstable) ChangesKernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits. Like any hashing scheme, hash collisions are possible. For example, a commodity Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 0.01383765 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf for the formula). This number can balloon with the addition of drivers and kernel modules. This patch reduces both the expected number of collisions and the potential impact of a collision by augmenting the hash with an arity value that indicates how many parameters the function has at the ABI level. Specifically, the patch further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit arity indicator as follows:
This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target. Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.
One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer. Full diff: https://github.com/llvm/llvm-project/pull/117121.diff 3 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..7cc6f120ec39a9 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
}
llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
- if (auto *FnType = T->getAs<FunctionProtoType>())
+ auto *FnType = T->getAs<FunctionProtoType>();
+ if (FnType)
T = getContext().getFunctionType(
FnType->getReturnType(), FnType->getParamTypes(),
FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
Out << ".normalized";
- return llvm::ConstantInt::get(Int32Ty,
- static_cast<uint32_t>(llvm::xxHash64(OutName)));
+ uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(OutName));
+ const auto &Triple = getTarget().getTriple();
+ if (Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+ // Estimate the function's arity (i.e., the number of arguments) at the ABI
+ // level by counting the number of parameters that are likely to be passed
+ // as registers, such as pointers and 64-bit (or smaller) integers. The
+ // Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+ // Additional parameters or parameters larger than 64 bits may be passed on
+ // the stack, in which case the arity is denoted as 7.
+ bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+ for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+ ++i) {
+ const Type *PT = FnType->getParamType(i).getTypePtr();
+ if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+ getContext().getTypeSize(PT) <= 64)))
+ MayHaveStackArgs = true;
+ }
+
+ // The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+ // to form an enhanced KCFI type ID. This can prevent, for example, a
+ // 3-arity function's ID from ever colliding with a 2-arity function's ID.
+ OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+ }
+
+ return llvm::ConstantInt::get(Int32Ty, OutHash);
}
void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
void foo(void (*fn)(int), int arg) {
// CHECK-LABEL: define{{.*}}foo
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+ // KCFI ID = 0x2A548E59
+ // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
fn(arg);
}
void bar(void (*fn)(int, int), int arg1, int arg2) {
// CHECK-LABEL: define{{.*}}bar
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ]
+ // KCFI ID = 0xD5A52C2A
+ // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ]
fn(arg1, arg2);
}
void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
// CHECK-LABEL: define{{.*}}baz
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ]
+ // KCFI ID = 0x2EA2BF3B
+ // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ]
fn(arg1, arg2, arg3);
}
// CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/kcfi.c
index 622843cedba50f..dc9e818a9f8cca 100644
--- a/clang/test/CodeGen/kcfi.c
+++ b/clang/test/CodeGen/kcfi.c
@@ -7,7 +7,6 @@
/// Must emit __kcfi_typeid symbols for address-taken function declarations
// CHECK: module asm ".weak __kcfi_typeid_[[F4:[a-zA-Z0-9_]+]]"
-// CHECK: module asm ".set __kcfi_typeid_[[F4]], [[#%d,HASH:]]"
/// Must not __kcfi_typeid symbols for non-address-taken declarations
// CHECK-NOT: module asm ".weak __kcfi_typeid_{{f6|_Z2f6v}}"
@@ -29,7 +28,7 @@ int __call(fn_t f) __attribute__((__no_sanitize__("kcfi"))) {
// CHECK: define dso_local{{.*}} i32 @{{call|_Z4callPFivE}}(ptr{{.*}} %f){{.*}}
int call(fn_t f) {
- // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#HASH]]) ]
+ // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#%d,HASH:]]) ]
return f();
}
@@ -48,6 +47,20 @@ static int f5(void) { return 2; }
// CHECK-DAG: declare !kcfi_type ![[#TYPE]]{{.*}} i32 @{{f6|_Z2f6v}}()
extern int f6(void);
+typedef struct {
+ int *p1;
+ int *p2[16];
+} s_t;
+
+// CHECK: define internal{{.*}} i32 @{{f7|_ZL2f7PFi3s_tEPS_}}(ptr{{.*}} %f, ptr{{.*}} %s){{.*}}
+static int f7(int (*f)(s_t), s_t *s) {
+ // CHECK: call{{.*}} i32 %{{.*}} [ "kcfi"(i32 [[#%d,HASH4:]]) ]
+ return f(*s) + 1;
+}
+
+// CHECK: define internal{{.*}} i32 @{{f8|_ZL2f83s_t}}(ptr{{.*}} %s){{.*}} !kcfi_type ![[#%d,TYPE4:]]
+static int f8(s_t s) { return 0; }
+
#ifndef __cplusplus
// C: define internal ptr @resolver1() #[[#]] !kcfi_type ![[#]] {
int ifunc1(int) __attribute__((ifunc("resolver1")));
@@ -59,12 +72,14 @@ long ifunc2(long) __attribute__((ifunc("resolver2")));
#endif
int test(void) {
+ s_t s;
return call(f1) +
__call((fn_t)f2) +
call(f3) +
call(f4) +
f5() +
- f6();
+ f6() +
+ f7(f8, &s);
}
#ifdef __cplusplus
@@ -85,3 +100,4 @@ void test_member_call(void) {
// CHECK-DAG: ![[#TYPE]] = !{i32 [[#HASH]]}
// CHECK-DAG: ![[#TYPE2]] = !{i32 [[#%d,HASH2:]]}
// MEMBER-DAG: ![[#TYPE3]] = !{i32 [[#HASH3]]}
+// CHECK-DAG: ![[#TYPE4]] = !{i32 [[#HASH4]]}
|
@phoebewang @lvwr Please review this PR. |
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
static_cast<uint32_t>(llvm::xxHash64(OutName))); | ||
uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(OutName)); | ||
const auto &Triple = getTarget().getTriple(); | ||
if (Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux())
Just to make sure that FnType isn't a null pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sirmc Thank you for the suggestion. I looked at the code and as far as I can tell, CreateKCFITypeId()
is only called here:
llvm-project/clang/lib/CodeGen/CodeGenModule.cpp
Line 2840 in b89e774
Ctx, MDB.createConstant(CreateKCFITypeId(FD->getType())))); |
T
will always be a function type. Regardless, I think your suggestion is reasonable, in case CreateKCFITypeID()
is called elsewhere in the future, so I incorporated this additional check into the PR.
@phoebewang and @lvwr I also noticed that there is this code in LLVM: llvm-project/llvm/lib/Transforms/Utils/ModuleUtils.cpp Lines 202 to 214 in 9ba6672
-fsanitize=kcfi .
When is this code triggered? And do you think it is necessary to additionally implement the arity-enhancement to this code? |
I'm not familar with KCFI. I find it's added by @samitolvanen in e1c36bd. I think you should triger it with attached test case. |
The collisions vary a lot with different number of function types. It looks to me more smooth if we use 2 bits to distinguish 4 cases: 1, 2, 3 and 0 or others. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
It looks to me like this code might be triggered in some LTO configurations, and/or when linking code compiled from multiple source languages with the expectation that the KCFI type IDs will be compatible. Is my understanding correct? The comment in the code says "Matches CodeGenModule::CreateKCFITypeId in Clang," which I interpret to mean that this code should produce identical KCFI type IDs for identical function types, which might be tricky if the target binary is compiled from different languages. I added some code to |
@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) { | |||
std::string Type = MangledType.str(); | |||
if (M.getModuleFlag("cfi-normalize-integers")) | |||
Type += ".normalized"; | |||
|
|||
uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(Type)); | |||
auto T = Triple(Twine(M.getTargetTriple())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Triple
line looks awkward and I regret that I needed to include another header to enable this. Maybe there is a workable API within the existing headers, but I couldn't find one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion! This does look tidier than what I had written. I updated the PR.
|
||
for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) { | ||
const llvm::Type *PT = F.getArg(i)->getType(); | ||
if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the if
condition equivalent to what I wrote in CodeGenModule::CreateKCFITypeId
with clang::Type
? Specifically, is
// typeof(*PT) = clang::Type
if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
getContext().getTypeSize(PT) <= 64)))
equivalent to:
// typeof(*PT) = llvm::Type
if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Front end like Clang has solved it already. I think we can simply checking the number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that clang does not reserve stack for large arguments and instead this is done later by the LLVM X86 backend. For example:
struct S {
int *p1;
int *p2;
int array[8];
};
int foo(struct S s, struct S *sp) {
return *s.p1 + *s.p2 + *sp->p1 + *sp->p2;
}
Then when I compile to LLVM IR I see:
define dso_local i32 @foo(ptr noundef byval(%struct.S) align 8 %s, ptr noundef %sp) #0 {
Which suggests an arity of 2. But the X86 backend transforms foo
to pass s
on the stack, and then sp
becomes the sole argument and is passed in rdi
. Hence, by the chart in the PR description, this should be treated as an arity-7 function:
Arity Indicator | Description |
---|---|
0 | 0 parameters |
1 | 1 parameter in RDI |
2 | 2 parameters in RDI and RSI |
3 | 3 parameters in RDI, RSI, and RDX |
4 | 4 parameters in RDI, RSI, RDX, and RCX |
5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 |
6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 |
7 | At least one parameter may be passed on the stack |
This predicate:
// typeof(*PT) = llvm::Type
if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))
MayHaveStackArgs = true;
should prevent s
from being counted as a register argument and correctly set the arity field to 7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, byval
is an exception. You can use hasPassPointeeByValueCopyAttr()
to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I looked at llvm::Argument::hasPassPointeeByValueCopyAttr()
, but it looks like it is only available where a function is being defined. It does not appear to be available where a call is made through a function pointer. Therefore, I'm not sure that llvm::Argument::hasPassPointeeByValueCopyAttr()
will be helpful since KCFI requires the ID to be computed identically at both the call site and the call target.
Or, do you think I am overlooking something, and that there is a way to use llvm::Argument::hasPassPointeeByValueCopyAttr()
or something similar at an indirect call site? As far as I can tell, the only information that is available at an indirect call site is the function pointer type, which does contain the number of arguments and their types, but does not appear to contain an indication as to whether an argument may be passed on the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an argument attribute, should be identical when defined and called, otherwise, we will have mismatch issue. I assume a simply use like this should be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @phoebewang, KCFI only computes hashes for indirect calls, not direct ones. The example you cited uses CallBase::getCalledFunction()
, whose documentation reads "Returns the function called, or null if this is an indirect function invocation or the function signature does not match the call signature."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand the point of indirect calls. Here setKCFIType
has an argument F
which never will be null. Why do we need another getCalledFunction()
?
I re-ran the numbers with a 30-bit hash and 2-bit arity, and you are correct that the distribution of expected collisions is more smooth:
However, a 2-bit arity would undermine what is arguably the more desirable property:
For example, if the 30-bit hash of a 0-arity function type collides with the 30-bit hash of the type of a 4-arity function type, then the Therefore, I believe that the 29-bit hash and 3-bit arity offers a more favorable security posture. |
Looks like the latter, see 71c7313 |
Although the default calling convention uses 6 registers, others like RegCall uses more. Do you want to check calling convention as well? |
Actually, I think this code was introduced to address a compatibility issue with KASAN, which apparently must generate KCFI-enabled code without clang. I found this explanation at 3b14862 and ClangBuiltLinux/linux#1742. Regardless, it looks like |
AFAIK the use case for KCFI is very narrow: the x86-64 Linux kernel. And I don't believe that the kernel uses (or even allows?) any calling convention other than the default. The kernel documentation also says that the eBPF calling convention "maps directly to ABIs used by the kernel on 64-bit architectures." But I admit I am not an expert on the Linux ABI nor am I an expert on the full scope of KCFI use cases. Maybe @lvwr can weigh in? |
While this overall looks like a nice improvement to me, changing the hashing scheme is going to break compatibility with Rust. I would suggest moving this change behind a command line flag, so the kernel can choose to enable it only if Rust support isn't enabled, or if we have a rustc that also supports this hashing scheme. |
Yes, this is needed for LLVM generated functions in general. KASAN is one use case, GCOV is another. |
Sami, I'm guessing you're mainly talking about rustc and clang needing to use llvm versions that agree on the kCFI implementation in use? Or are there additional things we need to do to make Rust support it / some fundamental reason this approach can't work in Rust? Either way, it would be nice to have some way to set the hashing method being used, or at least, a way to query which hashing strategy is used. We are going to want logic in Kconfig to ensure that rustc and clang use the same strategy, or at least a way to detect when they don't so we can fail the build with a good error message. cc @maurer |
Correct. Both compilers must use the same hashing scheme for cross-language indirect calls to work, so unconditionally changing how X86 type hashes are calculated in Clang breaks compatibility with current versions of rustc. Using compiler versions to figure out which scheme is being used is tedious, so we should have a better way to figure this out in Kconfig.
Exactly. I would prefer to have a command line option for this, perhaps something similar to -fsanitize-cfi-icall-experimental-normalize-integers, or even a flag that accepts the scheme to use as an argument. As long as we default to the original scheme to avoid breaking compatibility with other KCFI implementations. |
Linux also supports KCFI on 32-bit ARM and RISC-V. However, this change seems to only affect X86. |
My understanding is that the kernel respects the default calling convention for most things (if not all) and then build on top of it, like by defining indirect call and FineIBT must-use registers (r11 and r10). There are also specifics for syscalls, like using RAX to pass the syscall number. With the above said, I'm unsure if there is any orthogonal-craziness-custom-thing going on for the handwritten assembly code, but I would assume not given it is desirable to keep standards all around (maybe double check that with PeterZ for assurance?). Finally, for breadth, the ABI defines code mode kernel for handling symbols and relocations. But I think this is not concerning here. |
@samitolvanen Given this introduces benefits, I wonder if this shouldn't be the default and then we have a flag to disable the enhancement if rust is lagging. I assume Rust will want to catch up with the benefits of this, and when that happen, we need to add one less flag, which is nice 😎 I'm unsure who is working on rust support these days, but if it is @rcvalle, maybe he can speak about the likability of rust eventually incorporating these changes. edit: looks like rust is already tracking and willing to support it? Rust-for-Linux/linux#1132 (thanks @ojeda) |
Rust already supports kCFI and I see no reason it can't also support this scheme. We just need to be careful to introduce it in a good way that reduces the risk of mismatched hashing strategies. |
You're welcome! Just in case: if the question is about the entry I added today, then that entry is intended to track this PR, the potential consequences/needs for the Rust support in the kernel, related work/links, etc., but it does not imply Rust is tracking it or that they are willing to support it already. |
Flag guarding this feature seems like it would also be good for any existing C users - for example, if trying to build a kernel module intended to load against a kernel image built with an older |
+1 to this. Adding support for this to the Rust compiler shouldn't be a problem and @maurer or I could take a look at it. However, I wonder if the arity information should be from the high level information (i.e., from the function declarations/definitions) instead of from the lower level calling convention used/code generated (and possibly also affected by/after optimizations). This would ensure compatibility with any calling convention and any other possible differences that may come up later when using cross-language KCFI. (The KCFI encoding is based on high level type information from the function declarations/definitions.) |
The implementation in this PR does use high-level type information (but now that I look back at what I had written, I think my original PR description was a bit mis-leading). Here is an extended version of the arity table, with an additional column to show what the implementation is intended to do:
Hence, this implementation uses high-level type information from clang ( // test.c
struct S {
int *p1;
int *p2;
};
int foo(struct S s) { // 1 parameter
return *s.p1 + *s.p2;
} But then the struct gets decomposed into two separate registers by the compiler:
Thus, if an arity-1 function type's hash collides with |
Sorry, what I meant was we should be looking only at the number of return (with void being zero, and everything else being one) and parameters from the function declaration/definition and shouldn't be looking at the parameters' type layout (i.e., size), and how and what registers are used to pass them (i.e., we shouldn't mention or reference registers or assembly code at all in the implementation and documentation). The implementation should be completely ABI and architecture agnostic. Does it make sense? |
I think that the aspiration to be "completely ABI and architecture agnostic" is irreconcilable with the security objective to prevent a register that is dead at a call site from becoming live at a hash-collided call target. Your point about returns is interesting. AFAIK most x86 Linux kernels (including Ubuntu kernels) are being compiled with |
This all too architecture specific and isn't the level that LLVM CFI and KCFI originally work. The arity test should be at the language level. E.g.: void foo(void) is arity zero And so on. If we tried to cover every possible corner case for every ABI or at the architecture level we would end up with a different CFI implementation for every ABI and architecture (and possibly every combination of those). |
This PR is intended to improve security for X86 kernels, as the title suggests. I think that we can and should incorporate feedback from the other participants, several of whom have recommended that this new behavior should be explicitly enabled with a flag, rather than implicitly enabled by examining the target triple. So, for example, I could update the PR to introduce a new flag such as Flag-based enabling would also help to address the concerns about compiler compatibility. For instance, if an x86 kernel is being built with both C and Rust code, then Kconfig could check whether the local clang and rustc compilers support the KCFI arity enhancement; if one or the other does not support KCFI arity then Kconfig would fall back to the default 32-bit hash. @rcvalle I don't see any evidence that this approach will require a plethora of implementations for different arch/ABI combinations. As far as I am aware, the vast majority of x86-64 Linux kernel code (including eBPF code) uses the standard ABI. Even if some kernel code does not conform to the standard ABI, this should not break compatibility with the arity enhancement because the arity enhancement's implementation derives the arity tag from the function type, not from the ABI. Hence, a valid function call that uses a non-standard ABI would still experience a matching KCFI type ID at both the call site and the call target. The downside is that call sites/targets with a non-standard ABI might be tagged with an inaccurate arity, which could matter if the function type's 29-bit hash collides with another function type's 29-bit hash, and that other function's arity tag is the same but the two functions' actual arities differ. IMO this seems very unlikely. |
This is not a Rust concern, but re-reading the initial post, it looks like your own statistics suggest that consuming 3 bits for arity costs more than it buys you. As stated, (didn't check your math, just going off what you said) prior to your change, we expect 0.01383765 collisions in your sample environment. After your change, we expect to have the sum of your right hand column in collisions, which comes out to 0.0266774 - nearly double the rate of collisions we have with the basic implementation. In fact, I think that any scheme like this will always going to increase the number of overall collisions, given that the arity is implicitly hashed into the representation already. The main reason I could see to consider this is if for some reason a cross-arity collision is more dangerous than a same-arity collision in terms of exploitability, which I can't immediately argue, but perhaps you have something for this that was just assumed in the initial post? |
If you had checked my math, you would have noticed that the values in the right-hand column were not exactly correct! I found and fixed a bug in my Excel code and updated that chart, but the numbers didn't change by much.
Your analysis is correct that the total number of expected collisions would increase from 0.01375771 to 0.02660650 (with the updated statistics).
I think that a cross-arity collision is clearly more dangerous than a same-arity collision. This is what I had written in the PR description:
Including the arity in the hash does prevent or even reduce the likelihood that a cross-arity collision will occur, if the hash function is uniform. Suppose that two function types' hashes collide (under the current 32-bit hash scheme). The probability that the arities differ is equal to 1 minus the probability that the arities agree: 1 - ((32/10903)(31/10902) + (2492/10903)(2491/10902) + ... + (148/10903)*(147/10902)) = 0.75901 Thus, if there is a hash collision, there is roughly a 76% chance that it will be a cross-arity collision. |
I'm not sure I buy the argument that cross-arity functions are significantly more exploitable than same-arity mismatches. Restating your argument to make sure I've understood it correctly: When an attacker swaps in a function pointer of higher arity, a dead variable (or even a live one expected to be restored afterwards), may be used improperly by the function. The attacker is more likely to be able to control this than other parameters. (This last part is the piece that I don't see the justification for.) However, this argument applies to every argument consumed by a function - each is as likely to be attacker controlled as the contents of a dead register, and if the function pointer thinks it's getting a value of a different type (e.g. believes it's getting a pointer, gets an attacker controlled flags value). This seems to be an argument that higher arity functions are more likely to be exploitable because it's more likely that the attacker controls at least one of their argument. Additionally, the increased likelihood of exploitability of a mismatch needs to be quite high to justify what you're describing. Let
So if you want to claim this is to increase the strength of the scheme, you need to argue that an attacker is more than twice as likely to be able to exploit an arity mismatched function as a type mismatched one, which seems like a bit of a stretch - attackers often have direct control of intentionally passed arguments... |
Hi @maurer, I can honestly say that I appreciate your attention to detail. While your observation that "attackers often have direct control of intentionally passed arguments" is true in general, I do not see evidence of this among indirect calls in the Linux kernel. 77% of all indirect call parameters are pointers, and AFAIK user-mode code cannot exert arbitrary control over pointers within the kernel (especially given that x86 Linux kernels enable SMAP and SMEP when it is available on the CPU). That leaves the other 23% of indirect call parameters to consider, though I admit I don't have a concrete breakdown of what fraction of these might be attacker controllable. But let's assume for the sake of discussion that all of the non-pointer indirect call parameters are attacker controllable. And because we don't know which dead registers at a call site might be attacker-controllable, let's similarly assume for the sake of discussion that all of these dead registers are attacker-controllable. I built the table below to estimate for the arity-enhanced scheme a kind of exploitability score, i.e., the expected number of attacker-controlled parameters times the frequency of function types of that arity. For example, suppose that a hash collision between two function types occurs within the arity-1 partition. The expected number of attacker-controlled parameters at the first function is 0.23, and likewise for the second. So the total number of attacker-controlled parameters is 0.46, times the relative frequency of arity-1 function types (roughly 26%) gives a weighted exploitability of 0.105138035.
Total exploitability score for the arity enhancement: 1.146223975 A similar analysis can be applied to the existing 32-bit hash scheme. For example, suppose a 1-arity function collides with a 2-arity function. Then the expected number of attacker-controlled parameters that would be live at the mis-matched target would be 0.23 if the 2-arity function calls the 1-arity function, or 1.23 if the 1-arity function calls the 2-arity function (because the second parameter at the target would be a dead register at the caller). This data is captured in the table below, where the columns and rows refer to the respective arities of the colliding function types.
Assuming that a collision occurs, this table shows the probability density of the arities (note that the probabilities do sum to 1):
Then the weighted exploitability for each collision is:
Total exploitability score for the existing 32-bit hash: 2.201693943 So, if you accept my reasoning, then a collision in the 32-bit scheme is expected to be roughly twice as exploitable as a collision in the 29-bit scheme with 3 arity bits. On the other hand, the 29-bit scheme might produce twice as many collisions. |
I also do not want to lose sight of one of the other obvious advantages that was mentioned in the PR description:
To elaborate, we are concurrently working on a Linux kernel patch to enhance FineIBT (which is a KCFI-like solution that utilizes x86 Indirect Branch Tracking). The goal is to extend FineIBT to poison live argument registers if a hash check fails after a branch mis-prediction. This enhancement can help to mitigate a variety of Spectre attacks in the Linux kernel. |
I'm still trying to understand why a collision with a function of different arity is riskier than a collision with a function of the same arity, as there are so many factors that can account for it, such as:
And so on. Given that these depend on the context of the vulnerability being exploited, how can it be asserted that a collision with a function of different arity is riskier than a collision with a function of the same arity since the function arity doesn't imply or influence any of the above (except the fact that the callee may have less or more registers it possibly uses, but it's unknown if the attacker controls any of them, how they're used by the callee, and if any of them are used, if it's in a way that gives the attacker a more powerful primitive than they already have). |
I think Scott's second point may be the relevant one - this may be similar strength or slightly weaker, but having an indicator stating which registers are live is potentially needed to allow FineIBT to poison them during speculative execution. Given how much padding is used in the X86 kernel around function headers, have you considered just stealing another byte to encode the data for your arity stuff, and considering it to be a separate mitigation from KCFI tags? The rest of the CFI type ID scheme is arch-independent, but in order to know which registers are in use, you need arch dependent information, because you care about the calling convention, packing rules, etc. This is part of why Ramon thought your design choices were odd above - this isn't really a CFI enhancement or modification, this is another piece of information you need for speculation defenses that occur at a different abstraction level. |
If you are changing the hash scheme, please migrate from xxHash64 to xxh3. We want to remove the legacy xxHash64 from LLVM. |
Yes, and this is what our first prototype does. There is a brief writeup here: https://lkml.org/lkml/2024/9/27/982. When I started to refine the prototype into a PR, I noticed that stealing another byte in the padding region causes problems for either LLVM or for Linux. This is because LLVM emits function headers like this:
But when the Linux image is loaded, it moves the type information to the beginning of the padding, to allow the pre-function padding to be used for hot patching:
Thus, when new metadata is added to the padding region, it introduces new challenges for either LLVM or for Linux:
The discussions on LKML also led us to simplify the proposal, which originally had called for an 8-bit field to denote which registers are pointers, so that only pointers would be poisoned after a misprediction. The new proposal poisons all live registers and therefore requires a 3-bit field to record the arity (defined as the number of live registers). It occurred to me that arity might also be useful to harden the hash checks for KCFI, but that may require a consensus among KCFI stakeholders. One other alternative could be to use the 3-bit reg field in the MOV to encode the arity. For example:
I believe that this alternative implementation would also meet the requirements for the FineIBT register poisoning enhancement.
I agree with your interpretation. And thank you and @rcvalle again for the ongoing feedback and discussion. |
It seems from the discussion above that this change would require a new flag. Suppose, for instance, that a user builds a kernel with a version of Clang that uses xxh3, and a version of rustc that still uses xxHash64. Then the hashes could be incompatible and the kernel will crash. This PR is specific to X86-64 targets. Your suggestion seems like a general enhancement that should apply to all targets; I think it should be a separate PR. |
@rcvalle @maurer Do you have any feedback on the alternative proposal, #117121 (comment)? |
I haven't received a reply for my concerns I commented at #117121 (comment) and #117121 (comment), and they still remain for the alternative proposal. I'd recommend working on better defining the problem, and then working on a well-defined solution for the problem that doesn't affect or conflict with a currently well-defined solution to a different problem (i.e., KCFI). |
@phoebewang @sirmc @samitolvanen @Darksonn @lvwr @ojeda @maurer @rcvalle @MaskRay I have created #121070 to implement the alternate proposal summarized in this comment above: #117121 (comment). |
@rcvalle I think that #117121 (comment) isn't relevant for the alternative proposal in #121070 because that proposal does not attempt to prevent cross-arity collisions. Do you agree? Or have I mis-understood your concern? I'm not sure whether #117121 (comment) is relevant for the alternative proposal. Please feel free to discuss at #121070. |
Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits.
Like any hashing scheme, hash collisions are possible. For example, a commodity Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 0.01375771 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf for the formula). This number can balloon with the addition of drivers and kernel modules.
This patch reduces both the expected number of collisions and the potential impact of a collision by augmenting the hash with an arity value that indicates how many parameters the function has at the ABI level. Specifically, the patch further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit arity indicator as follows:
This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target.
Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.
One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer.
EDIT: I tested this PR by recompiling and running a Linux 6.11 kernel with the following parameters:
cfi=off
cfi=kcfi
cfi=kcfi cfi=norand
cfi=fineibt
cfi=fineibt cfi=norand
With each of these configurations the kernel runs without crashing. Furthermore, the KCFI type IDs appear to be correct. For example, in the
cfi=kcfi cfi=norand
configuration, the CFI header fordd_merged_requests
has its low-order 3 bits set to0b011
, which matches the arity of that function (its arity is 3):And when I disassemble a call site of the same type, I can confirm that the call site uses the additive inverse of the enhanced KCFI type ID:
Note that
0x8dfc864b + 0x720379b5 = 0
.