Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Enhance kCFI type IDs with a 3-bit arity indicator. #117121

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

scottconstable
Copy link
Contributor

@scottconstable scottconstable commented Nov 21, 2024

Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits.

Like any hashing scheme, hash collisions are possible. For example, a commodity Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 0.01375771 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf for the formula). This number can balloon with the addition of drivers and kernel modules.

This patch reduces both the expected number of collisions and the potential impact of a collision by augmenting the hash with an arity value that indicates how many parameters the function has at the ABI level. Specifically, the patch further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit arity indicator as follows:

Arity Indicator Description
0 0 parameters
1 1 parameter in RDI
2 2 parameters in RDI and RSI
3 3 parameters in RDI, RSI, and RDX
4 4 parameters in RDI, RSI, RDX, and RCX
5 5 parameters in RDI, RSI, RDX, RCX, and R8
6 6 parameters in RDI, RSI, RDX, RCX, R8, and R9
7 At least one parameter may be passed on the stack

This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target.

Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.

Arity Unique Indirect Callable Function Types Number of Expected Collisions
0 32 0.00000072
1 2492 0.00576502
2 3775 0.01324385
3 2547 0.00602275
4 1169 0.00126398
5 519 0.00024700
6 221 0.00004387
7 148 0.00001931

One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer.

EDIT: I tested this PR by recompiling and running a Linux 6.11 kernel with the following parameters:

  • cfi=off
  • cfi=kcfi
  • cfi=kcfi cfi=norand
  • cfi=fineibt
  • cfi=fineibt cfi=norand

With each of these configurations the kernel runs without crashing. Furthermore, the KCFI type IDs appear to be correct. For example, in the cfi=kcfi cfi=norand configuration, the CFI header for dd_merged_requests has its low-order 3 bits set to 0b011, which matches the arity of that function (its arity is 3):

(gdb) disass dd_merged_requests-16
Dump of assembler code for function __cfi_dd_merged_requests:
   0xffffffff818d4ca0 <+0>:     mov    $0x8dfc864b,%eax

And when I disassemble a call site of the same type, I can confirm that the call site uses the additive inverse of the enhanced KCFI type ID:

(gdb) disass elv_merge_requests                                                                                                                                                                                  Dump of assembler code for function elv_merge_requests:
...
   0xffffffff8188c0ac <+44>:    mov    $0x720379b5,%r10d
   0xffffffff8188c0b2 <+50>:    add    -0xf(%r11),%r10d
   0xffffffff8188c0b6 <+54>:    je     0xffffffff8188c0ba <elv_merge_requests+58>
   0xffffffff8188c0b8 <+56>:    ud2
   0xffffffff8188c0ba <+58>:    call   *%r11

Note that 0x8dfc864b + 0x720379b5 = 0.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen labels Nov 21, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-clang

Author: Scott Constable (scottconstable)

Changes

Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits.

Like any hashing scheme, hash collisions are possible. For example, a commodity Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 0.01383765 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf for the formula). This number can balloon with the addition of drivers and kernel modules.

This patch reduces both the expected number of collisions and the potential impact of a collision by augmenting the hash with an arity value that indicates how many parameters the function has at the ABI level. Specifically, the patch further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit arity indicator as follows:

Arity Indicator Description
0 0 parameters
1 1 parameter in RDI
2 2 parameters in RDI and RSI
3 3 parameters in RDI, RSI, and RDX
4 4 parameters in RDI, RSI, RDX, and RCX
5 5 parameters in RDI, RSI, RDX, RCX, and R8
6 6 parameters in RDI, RSI, RDX, RCX, R8, and R9
7 At least one parameter may be passed on the stack

This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target.

Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.

Arity Unique Indirect Callable Function Types Number of Expected Collisions
0 32 0.00000092
1 2492 0.00578125
2 3775 0.01326841
3 2547 0.00603931
4 1169 0.00127162
5 519 0.00025038
6 221 0.00004528
7 148 0.00002026

One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer.


Full diff: https://github.com/llvm/llvm-project/pull/117121.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+28-3)
  • (modified) clang/test/CodeGen/kcfi-normalize.c (+12-6)
  • (modified) clang/test/CodeGen/kcfi.c (+19-3)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..7cc6f120ec39a9 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
 }
 
 llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
-  if (auto *FnType = T->getAs<FunctionProtoType>())
+  auto *FnType = T->getAs<FunctionProtoType>();
+  if (FnType)
     T = getContext().getFunctionType(
         FnType->getReturnType(), FnType->getParamTypes(),
         FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
   if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
     Out << ".normalized";
 
-  return llvm::ConstantInt::get(Int32Ty,
-                                static_cast<uint32_t>(llvm::xxHash64(OutName)));
+  uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(OutName));
+  const auto &Triple = getTarget().getTriple();
+  if (Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+    // Estimate the function's arity (i.e., the number of arguments) at the ABI
+    // level by counting the number of parameters that are likely to be passed
+    // as registers, such as pointers and 64-bit (or smaller) integers. The
+    // Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+    // Additional parameters or parameters larger than 64 bits may be passed on
+    // the stack, in which case the arity is denoted as 7.
+    bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+    for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+         ++i) {
+      const Type *PT = FnType->getParamType(i).getTypePtr();
+      if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+                                    getContext().getTypeSize(PT) <= 64)))
+        MayHaveStackArgs = true;
+    }
+
+    // The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+    // to form an enhanced KCFI type ID. This can prevent, for example, a
+    // 3-arity function's ID from ever colliding with a 2-arity function's ID.
+    OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+  }
+
+  return llvm::ConstantInt::get(Int32Ty, OutHash);
 }
 
 void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
 void foo(void (*fn)(int), int arg) {
     // CHECK-LABEL: define{{.*}}foo
     // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
-    // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+    // KCFI ID = 0x2A548E59
+    // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
     fn(arg);
 }
 
 void bar(void (*fn)(int, int), int arg1, int arg2) {
     // CHECK-LABEL: define{{.*}}bar
     // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
-    // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ]
+    // KCFI ID = 0xD5A52C2A
+    // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ]
     fn(arg1, arg2);
 }
 
 void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
     // CHECK-LABEL: define{{.*}}baz
     // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
-    // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ]
+    // KCFI ID = 0x2EA2BF3B
+    // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ]
     fn(arg1, arg2, arg3);
 }
 
 // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/kcfi.c
index 622843cedba50f..dc9e818a9f8cca 100644
--- a/clang/test/CodeGen/kcfi.c
+++ b/clang/test/CodeGen/kcfi.c
@@ -7,7 +7,6 @@
 
 /// Must emit __kcfi_typeid symbols for address-taken function declarations
 // CHECK: module asm ".weak __kcfi_typeid_[[F4:[a-zA-Z0-9_]+]]"
-// CHECK: module asm ".set __kcfi_typeid_[[F4]], [[#%d,HASH:]]"
 /// Must not __kcfi_typeid symbols for non-address-taken declarations
 // CHECK-NOT: module asm ".weak __kcfi_typeid_{{f6|_Z2f6v}}"
 
@@ -29,7 +28,7 @@ int __call(fn_t f) __attribute__((__no_sanitize__("kcfi"))) {
 
 // CHECK: define dso_local{{.*}} i32 @{{call|_Z4callPFivE}}(ptr{{.*}} %f){{.*}}
 int call(fn_t f) {
-  // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#HASH]]) ]
+  // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#%d,HASH:]]) ]
   return f();
 }
 
@@ -48,6 +47,20 @@ static int f5(void) { return 2; }
 // CHECK-DAG: declare !kcfi_type ![[#TYPE]]{{.*}} i32 @{{f6|_Z2f6v}}()
 extern int f6(void);
 
+typedef struct {
+  int *p1;
+  int *p2[16];
+} s_t;
+
+// CHECK: define internal{{.*}} i32 @{{f7|_ZL2f7PFi3s_tEPS_}}(ptr{{.*}} %f, ptr{{.*}} %s){{.*}}
+static int f7(int (*f)(s_t), s_t *s) {
+  // CHECK: call{{.*}} i32 %{{.*}} [ "kcfi"(i32 [[#%d,HASH4:]]) ]
+  return f(*s) + 1;
+}
+
+// CHECK: define internal{{.*}} i32 @{{f8|_ZL2f83s_t}}(ptr{{.*}} %s){{.*}} !kcfi_type ![[#%d,TYPE4:]]
+static int f8(s_t s) { return 0; }
+
 #ifndef __cplusplus
 // C: define internal ptr @resolver1() #[[#]] !kcfi_type ![[#]] {
 int ifunc1(int) __attribute__((ifunc("resolver1")));
@@ -59,12 +72,14 @@ long ifunc2(long) __attribute__((ifunc("resolver2")));
 #endif
 
 int test(void) {
+  s_t s;
   return call(f1) +
          __call((fn_t)f2) +
          call(f3) +
          call(f4) +
          f5() +
-         f6();
+         f6() +
+         f7(f8, &s);
 }
 
 #ifdef __cplusplus
@@ -85,3 +100,4 @@ void test_member_call(void) {
 // CHECK-DAG: ![[#TYPE]] = !{i32 [[#HASH]]}
 // CHECK-DAG: ![[#TYPE2]] = !{i32 [[#%d,HASH2:]]}
 // MEMBER-DAG: ![[#TYPE3]] = !{i32 [[#HASH3]]}
+// CHECK-DAG: ![[#TYPE4]] = !{i32 [[#HASH4]]}

@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-clang-codegen

Author: Scott Constable (scottconstable)

Changes

Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits.

Like any hashing scheme, hash collisions are possible. For example, a commodity Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 0.01383765 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf for the formula). This number can balloon with the addition of drivers and kernel modules.

This patch reduces both the expected number of collisions and the potential impact of a collision by augmenting the hash with an arity value that indicates how many parameters the function has at the ABI level. Specifically, the patch further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit arity indicator as follows:

Arity Indicator Description
0 0 parameters
1 1 parameter in RDI
2 2 parameters in RDI and RSI
3 3 parameters in RDI, RSI, and RDX
4 4 parameters in RDI, RSI, RDX, and RCX
5 5 parameters in RDI, RSI, RDX, RCX, and R8
6 6 parameters in RDI, RSI, RDX, RCX, R8, and R9
7 At least one parameter may be passed on the stack

This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target.

Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.

Arity Unique Indirect Callable Function Types Number of Expected Collisions
0 32 0.00000092
1 2492 0.00578125
2 3775 0.01326841
3 2547 0.00603931
4 1169 0.00127162
5 519 0.00025038
6 221 0.00004528
7 148 0.00002026

One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer.


Full diff: https://github.com/llvm/llvm-project/pull/117121.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+28-3)
  • (modified) clang/test/CodeGen/kcfi-normalize.c (+12-6)
  • (modified) clang/test/CodeGen/kcfi.c (+19-3)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..7cc6f120ec39a9 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
 }
 
 llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
-  if (auto *FnType = T->getAs<FunctionProtoType>())
+  auto *FnType = T->getAs<FunctionProtoType>();
+  if (FnType)
     T = getContext().getFunctionType(
         FnType->getReturnType(), FnType->getParamTypes(),
         FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
   if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
     Out << ".normalized";
 
-  return llvm::ConstantInt::get(Int32Ty,
-                                static_cast<uint32_t>(llvm::xxHash64(OutName)));
+  uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(OutName));
+  const auto &Triple = getTarget().getTriple();
+  if (Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+    // Estimate the function's arity (i.e., the number of arguments) at the ABI
+    // level by counting the number of parameters that are likely to be passed
+    // as registers, such as pointers and 64-bit (or smaller) integers. The
+    // Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+    // Additional parameters or parameters larger than 64 bits may be passed on
+    // the stack, in which case the arity is denoted as 7.
+    bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+    for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+         ++i) {
+      const Type *PT = FnType->getParamType(i).getTypePtr();
+      if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+                                    getContext().getTypeSize(PT) <= 64)))
+        MayHaveStackArgs = true;
+    }
+
+    // The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+    // to form an enhanced KCFI type ID. This can prevent, for example, a
+    // 3-arity function's ID from ever colliding with a 2-arity function's ID.
+    OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+  }
+
+  return llvm::ConstantInt::get(Int32Ty, OutHash);
 }
 
 void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
 void foo(void (*fn)(int), int arg) {
     // CHECK-LABEL: define{{.*}}foo
     // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
-    // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+    // KCFI ID = 0x2A548E59
+    // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
     fn(arg);
 }
 
 void bar(void (*fn)(int, int), int arg1, int arg2) {
     // CHECK-LABEL: define{{.*}}bar
     // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
-    // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ]
+    // KCFI ID = 0xD5A52C2A
+    // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ]
     fn(arg1, arg2);
 }
 
 void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
     // CHECK-LABEL: define{{.*}}baz
     // CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
-    // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ]
+    // KCFI ID = 0x2EA2BF3B
+    // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ]
     fn(arg1, arg2, arg3);
 }
 
 // CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/kcfi.c
index 622843cedba50f..dc9e818a9f8cca 100644
--- a/clang/test/CodeGen/kcfi.c
+++ b/clang/test/CodeGen/kcfi.c
@@ -7,7 +7,6 @@
 
 /// Must emit __kcfi_typeid symbols for address-taken function declarations
 // CHECK: module asm ".weak __kcfi_typeid_[[F4:[a-zA-Z0-9_]+]]"
-// CHECK: module asm ".set __kcfi_typeid_[[F4]], [[#%d,HASH:]]"
 /// Must not __kcfi_typeid symbols for non-address-taken declarations
 // CHECK-NOT: module asm ".weak __kcfi_typeid_{{f6|_Z2f6v}}"
 
@@ -29,7 +28,7 @@ int __call(fn_t f) __attribute__((__no_sanitize__("kcfi"))) {
 
 // CHECK: define dso_local{{.*}} i32 @{{call|_Z4callPFivE}}(ptr{{.*}} %f){{.*}}
 int call(fn_t f) {
-  // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#HASH]]) ]
+  // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#%d,HASH:]]) ]
   return f();
 }
 
@@ -48,6 +47,20 @@ static int f5(void) { return 2; }
 // CHECK-DAG: declare !kcfi_type ![[#TYPE]]{{.*}} i32 @{{f6|_Z2f6v}}()
 extern int f6(void);
 
+typedef struct {
+  int *p1;
+  int *p2[16];
+} s_t;
+
+// CHECK: define internal{{.*}} i32 @{{f7|_ZL2f7PFi3s_tEPS_}}(ptr{{.*}} %f, ptr{{.*}} %s){{.*}}
+static int f7(int (*f)(s_t), s_t *s) {
+  // CHECK: call{{.*}} i32 %{{.*}} [ "kcfi"(i32 [[#%d,HASH4:]]) ]
+  return f(*s) + 1;
+}
+
+// CHECK: define internal{{.*}} i32 @{{f8|_ZL2f83s_t}}(ptr{{.*}} %s){{.*}} !kcfi_type ![[#%d,TYPE4:]]
+static int f8(s_t s) { return 0; }
+
 #ifndef __cplusplus
 // C: define internal ptr @resolver1() #[[#]] !kcfi_type ![[#]] {
 int ifunc1(int) __attribute__((ifunc("resolver1")));
@@ -59,12 +72,14 @@ long ifunc2(long) __attribute__((ifunc("resolver2")));
 #endif
 
 int test(void) {
+  s_t s;
   return call(f1) +
          __call((fn_t)f2) +
          call(f3) +
          call(f4) +
          f5() +
-         f6();
+         f6() +
+         f7(f8, &s);
 }
 
 #ifdef __cplusplus
@@ -85,3 +100,4 @@ void test_member_call(void) {
 // CHECK-DAG: ![[#TYPE]] = !{i32 [[#HASH]]}
 // CHECK-DAG: ![[#TYPE2]] = !{i32 [[#%d,HASH2:]]}
 // MEMBER-DAG: ![[#TYPE3]] = !{i32 [[#HASH3]]}
+// CHECK-DAG: ![[#TYPE4]] = !{i32 [[#HASH4]]}

@scottconstable
Copy link
Contributor Author

@phoebewang @lvwr Please review this PR.

static_cast<uint32_t>(llvm::xxHash64(OutName)));
uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(OutName));
const auto &Triple = getTarget().getTriple();
if (Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> if (FnType && Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux())

Just to make sure that FnType isn't a null pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sirmc Thank you for the suggestion. I looked at the code and as far as I can tell, CreateKCFITypeId() is only called here:

Ctx, MDB.createConstant(CreateKCFITypeId(FD->getType()))));
, which seems to guarantee that T will always be a function type. Regardless, I think your suggestion is reasonable, in case CreateKCFITypeID() is called elsewhere in the future, so I incorporated this additional check into the PR.

@scottconstable
Copy link
Contributor Author

@phoebewang and @lvwr I also noticed that there is this code in LLVM:

void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) {
if (!M.getModuleFlag("kcfi"))
return;
// Matches CodeGenModule::CreateKCFITypeId in Clang.
LLVMContext &Ctx = M.getContext();
MDBuilder MDB(Ctx);
std::string Type = MangledType.str();
if (M.getModuleFlag("cfi-normalize-integers"))
Type += ".normalized";
F.setMetadata(LLVMContext::MD_kcfi_type,
MDNode::get(Ctx, MDB.createConstant(ConstantInt::get(
Type::getInt32Ty(Ctx),
static_cast<uint32_t>(xxHash64(Type))))));
. As far as I can tell, this code is not triggered when I build the Linux kernel with -fsanitize=kcfi.

When is this code triggered? And do you think it is necessary to additionally implement the arity-enhancement to this code?

@phoebewang
Copy link
Contributor

@phoebewang and @lvwr I also noticed that there is this code in LLVM:

void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) {
if (!M.getModuleFlag("kcfi"))
return;
// Matches CodeGenModule::CreateKCFITypeId in Clang.
LLVMContext &Ctx = M.getContext();
MDBuilder MDB(Ctx);
std::string Type = MangledType.str();
if (M.getModuleFlag("cfi-normalize-integers"))
Type += ".normalized";
F.setMetadata(LLVMContext::MD_kcfi_type,
MDNode::get(Ctx, MDB.createConstant(ConstantInt::get(
Type::getInt32Ty(Ctx),
static_cast<uint32_t>(xxHash64(Type))))));

. As far as I can tell, this code is not triggered when I build the Linux kernel with -fsanitize=kcfi.
When is this code triggered? And do you think it is necessary to additionally implement the arity-enhancement to this code?

I'm not familar with KCFI. I find it's added by @samitolvanen in e1c36bd. I think you should triger it with attached test case.

@phoebewang
Copy link
Contributor

Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.

The collisions vary a lot with different number of function types. It looks to me more smooth if we use 2 bits to distinguish 4 cases: 1, 2, 3 and 0 or others.

Copy link

github-actions bot commented Nov 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@scottconstable
Copy link
Contributor Author

@phoebewang and @lvwr I also noticed that there is this code in LLVM:

void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) {
if (!M.getModuleFlag("kcfi"))
return;
// Matches CodeGenModule::CreateKCFITypeId in Clang.
LLVMContext &Ctx = M.getContext();
MDBuilder MDB(Ctx);
std::string Type = MangledType.str();
if (M.getModuleFlag("cfi-normalize-integers"))
Type += ".normalized";
F.setMetadata(LLVMContext::MD_kcfi_type,
MDNode::get(Ctx, MDB.createConstant(ConstantInt::get(
Type::getInt32Ty(Ctx),
static_cast<uint32_t>(xxHash64(Type))))));

. As far as I can tell, this code is not triggered when I build the Linux kernel with -fsanitize=kcfi.
When is this code triggered? And do you think it is necessary to additionally implement the arity-enhancement to this code?

I'm not familar with KCFI. I find it's added by @samitolvanen in e1c36bd. I think you should triger it with attached test case.

It looks to me like this code might be triggered in some LTO configurations, and/or when linking code compiled from multiple source languages with the expectation that the KCFI type IDs will be compatible. Is my understanding correct?

The comment in the code says "Matches CodeGenModule::CreateKCFITypeId in Clang," which I interpret to mean that this code should produce identical KCFI type IDs for identical function types, which might be tricky if the target binary is compiled from different languages. I added some code to llvm::setKCFIType that I hope will produce consistent output, but admittedly I'm not sure that my treatment of clang::Type and llvm::Type is consistent.

@@ -208,10 +209,34 @@ void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) {
std::string Type = MangledType.str();
if (M.getModuleFlag("cfi-normalize-integers"))
Type += ".normalized";

uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(Type));
auto T = Triple(Twine(M.getTargetTriple()));
Copy link
Contributor Author

@scottconstable scottconstable Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Triple line looks awkward and I regret that I needed to include another header to enable this. Maybe there is a workable API within the existing headers, but I couldn't find one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! This does look tidier than what I had written. I updated the PR.


for (unsigned int i = 0; !MayHaveStackArgs && i < NumParams; ++i) {
const llvm::Type *PT = F.getArg(i)->getType();
if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))
Copy link
Contributor Author

@scottconstable scottconstable Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the if condition equivalent to what I wrote in CodeGenModule::CreateKCFITypeId with clang::Type? Specifically, is

      // typeof(*PT) = clang::Type
      if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
                                    getContext().getTypeSize(PT) <= 64)))

equivalent to:

      // typeof(*PT) = llvm::Type
      if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Front end like Clang has solved it already. I think we can simply checking the number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that clang does not reserve stack for large arguments and instead this is done later by the LLVM X86 backend. For example:

struct S {
    int *p1;
    int *p2;
    int array[8];
};

int foo(struct S s, struct S *sp) {
    return *s.p1 + *s.p2 + *sp->p1 + *sp->p2;
}

Then when I compile to LLVM IR I see:

define dso_local i32 @foo(ptr noundef byval(%struct.S) align 8 %s, ptr noundef %sp) #0 {

Which suggests an arity of 2. But the X86 backend transforms foo to pass s on the stack, and then sp becomes the sole argument and is passed in rdi. Hence, by the chart in the PR description, this should be treated as an arity-7 function:

Arity Indicator Description
0 0 parameters
1 1 parameter in RDI
2 2 parameters in RDI and RSI
3 3 parameters in RDI, RSI, and RDX
4 4 parameters in RDI, RSI, RDX, and RCX
5 5 parameters in RDI, RSI, RDX, RCX, and R8
6 6 parameters in RDI, RSI, RDX, RCX, R8, and R9
7 At least one parameter may be passed on the stack

This predicate:

      // typeof(*PT) = llvm::Type
      if (!(PT->isPointerTy() || PT->getIntegerBitWidth() <= 64))
        MayHaveStackArgs = true;

should prevent s from being counted as a register argument and correctly set the arity field to 7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, byval is an exception. You can use hasPassPointeeByValueCopyAttr() to check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. I looked at llvm::Argument::hasPassPointeeByValueCopyAttr(), but it looks like it is only available where a function is being defined. It does not appear to be available where a call is made through a function pointer. Therefore, I'm not sure that llvm::Argument::hasPassPointeeByValueCopyAttr() will be helpful since KCFI requires the ID to be computed identically at both the call site and the call target.

Or, do you think I am overlooking something, and that there is a way to use llvm::Argument::hasPassPointeeByValueCopyAttr() or something similar at an indirect call site? As far as I can tell, the only information that is available at an indirect call site is the function pointer type, which does contain the number of arguments and their types, but does not appear to contain an indication as to whether an argument may be passed on the stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an argument attribute, should be identical when defined and called, otherwise, we will have mismatch issue. I assume a simply use like this should be ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @phoebewang, KCFI only computes hashes for indirect calls, not direct ones. The example you cited uses CallBase::getCalledFunction(), whose documentation reads "Returns the function called, or null if this is an indirect function invocation or the function signature does not match the call signature."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand the point of indirect calls. Here setKCFIType has an argument F which never will be null. Why do we need another getCalledFunction()?

@scottconstable
Copy link
Contributor Author

Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.

The collisions vary a lot with different number of function types. It looks to me more smooth if we use 2 bits to distinguish 4 cases: 1, 2, 3 and 0 or others.

I re-ran the numbers with a 30-bit hash and 2-bit arity, and you are correct that the distribution of expected collisions is more smooth:

Arity Unique Indirect Callable Function Types Number of Expected Collisions
0 or >3 2089 0.00201654
1 2492 0.00287330
2 3775 0.00660789
3 2547 0.00300181

However, a 2-bit arity would undermine what is arguably the more desirable property:

This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target.

For example, if the 30-bit hash of a 0-arity function type collides with the 30-bit hash of the type of a 4-arity function type, then the RDI, RSI, RDX, and RCX registers that die when calling a function of the 0-arity type will unexpectedly become live if a COP attack redirects the call to a function of the 4-arity type.

Therefore, I believe that the 29-bit hash and 3-bit arity offers a more favorable security posture.

@phoebewang
Copy link
Contributor

@phoebewang and @lvwr I also noticed that there is this code in LLVM:

void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) {
if (!M.getModuleFlag("kcfi"))
return;
// Matches CodeGenModule::CreateKCFITypeId in Clang.
LLVMContext &Ctx = M.getContext();
MDBuilder MDB(Ctx);
std::string Type = MangledType.str();
if (M.getModuleFlag("cfi-normalize-integers"))
Type += ".normalized";
F.setMetadata(LLVMContext::MD_kcfi_type,
MDNode::get(Ctx, MDB.createConstant(ConstantInt::get(
Type::getInt32Ty(Ctx),
static_cast<uint32_t>(xxHash64(Type))))));

. As far as I can tell, this code is not triggered when I build the Linux kernel with -fsanitize=kcfi.
When is this code triggered? And do you think it is necessary to additionally implement the arity-enhancement to this code?

I'm not familar with KCFI. I find it's added by @samitolvanen in e1c36bd. I think you should triger it with attached test case.

It looks to me like this code might be triggered in some LTO configurations, and/or when linking code compiled from multiple source languages with the expectation that the KCFI type IDs will be compatible. Is my understanding correct?

Looks like the latter, see 71c7313

@phoebewang
Copy link
Contributor

Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.

The collisions vary a lot with different number of function types. It looks to me more smooth if we use 2 bits to distinguish 4 cases: 1, 2, 3 and 0 or others.

I re-ran the numbers with a 30-bit hash and 2-bit arity, and you are correct that the distribution of expected collisions is more smooth:

Arity Unique Indirect Callable Function Types Number of Expected Collisions
0 or >3 2089 0.00201654
1 2492 0.00287330
2 3775 0.00660789
3 2547 0.00300181
However, a 2-bit arity would undermine what is arguably the more desirable property:

This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target.

For example, if the 30-bit hash of a 0-arity function type collides with the 30-bit hash of the type of a 4-arity function type, then the RDI, RSI, RDX, and RCX registers that die when calling a function of the 0-arity type will unexpectedly become live if a COP attack redirects the call to a function of the 4-arity type.

Therefore, I believe that the 29-bit hash and 3-bit arity offers a more favorable security posture.

Although the default calling convention uses 6 registers, others like RegCall uses more. Do you want to check calling convention as well?

@scottconstable
Copy link
Contributor Author

@phoebewang and @lvwr I also noticed that there is this code in LLVM:

void llvm::setKCFIType(Module &M, Function &F, StringRef MangledType) {
if (!M.getModuleFlag("kcfi"))
return;
// Matches CodeGenModule::CreateKCFITypeId in Clang.
LLVMContext &Ctx = M.getContext();
MDBuilder MDB(Ctx);
std::string Type = MangledType.str();
if (M.getModuleFlag("cfi-normalize-integers"))
Type += ".normalized";
F.setMetadata(LLVMContext::MD_kcfi_type,
MDNode::get(Ctx, MDB.createConstant(ConstantInt::get(
Type::getInt32Ty(Ctx),
static_cast<uint32_t>(xxHash64(Type))))));

. As far as I can tell, this code is not triggered when I build the Linux kernel with -fsanitize=kcfi.
When is this code triggered? And do you think it is necessary to additionally implement the arity-enhancement to this code?

I'm not familar with KCFI. I find it's added by @samitolvanen in e1c36bd. I think you should triger it with attached test case.

It looks to me like this code might be triggered in some LTO configurations, and/or when linking code compiled from multiple source languages with the expectation that the KCFI type IDs will be compatible. Is my understanding correct?

Looks like the latter, see 71c7313

Actually, I think this code was introduced to address a compatibility issue with KASAN, which apparently must generate KCFI-enabled code without clang. I found this explanation at 3b14862 and ClangBuiltLinux/linux#1742.

Regardless, it looks like llvm::setKCFIType is intended to always produce the same KCFI type ID as CodeGenModule::CreateKCFITypeId for equivalent function types. For this PR, this implies that llvm::setKCFIType and CodeGenModule::CreateKCFITypeId must always infer the same arity for the same function type.

@scottconstable
Copy link
Contributor Author

Although the default calling convention uses 6 registers, others like RegCall uses more. Do you want to check calling convention as well?

AFAIK the use case for KCFI is very narrow: the x86-64 Linux kernel. And I don't believe that the kernel uses (or even allows?) any calling convention other than the default. The kernel documentation also says that the eBPF calling convention "maps directly to ABIs used by the kernel on 64-bit architectures." But I admit I am not an expert on the Linux ABI nor am I an expert on the full scope of KCFI use cases.

Maybe @lvwr can weigh in?

@samitolvanen
Copy link
Member

While this overall looks like a nice improvement to me, changing the hashing scheme is going to break compatibility with Rust. I would suggest moving this change behind a command line flag, so the kernel can choose to enable it only if Rust support isn't enabled, or if we have a rustc that also supports this hashing scheme.

@samitolvanen
Copy link
Member

Actually, I think this code was introduced to address a compatibility issue with KASAN, which apparently must generate KCFI-enabled code without clang. I found this explanation at 3b14862 and ClangBuiltLinux/linux#1742.

Yes, this is needed for LLVM generated functions in general. KASAN is one use case, GCOV is another.

@Darksonn
Copy link

Sami, I'm guessing you're mainly talking about rustc and clang needing to use llvm versions that agree on the kCFI implementation in use? Or are there additional things we need to do to make Rust support it / some fundamental reason this approach can't work in Rust?

Either way, it would be nice to have some way to set the hashing method being used, or at least, a way to query which hashing strategy is used. We are going to want logic in Kconfig to ensure that rustc and clang use the same strategy, or at least a way to detect when they don't so we can fail the build with a good error message.

cc @maurer

@samitolvanen
Copy link
Member

Sami, I'm guessing you're mainly talking about rustc and clang needing to use llvm versions that agree on the kCFI implementation in use?

Correct. Both compilers must use the same hashing scheme for cross-language indirect calls to work, so unconditionally changing how X86 type hashes are calculated in Clang breaks compatibility with current versions of rustc. Using compiler versions to figure out which scheme is being used is tedious, so we should have a better way to figure this out in Kconfig.

Either way, it would be nice to have some way to set the hashing method being used, or at least, a way to query which hashing strategy is used. We are going to want logic in Kconfig to ensure that rustc and clang use the same strategy, or at least a way to detect when they don't so we can fail the build with a good error message.

Exactly. I would prefer to have a command line option for this, perhaps something similar to -fsanitize-cfi-icall-experimental-normalize-integers, or even a flag that accepts the scheme to use as an argument. As long as we default to the original scheme to avoid breaking compatibility with other KCFI implementations.

@samitolvanen
Copy link
Member

AFAIK the use case for KCFI is very narrow: the x86-64 Linux kernel.

At the very least, it's also used in the arm64 Linux Kernel.

Linux also supports KCFI on 32-bit ARM and RISC-V. However, this change seems to only affect X86.

@lvwr
Copy link
Contributor

lvwr commented Nov 26, 2024

Although the default calling convention uses 6 registers, others like RegCall uses more. Do you want to check calling convention as well?

AFAIK the use case for KCFI is very narrow: the x86-64 Linux kernel. And I don't believe that the kernel uses (or even allows?) any calling convention other than the default. The kernel documentation also says that the eBPF calling convention "maps directly to ABIs used by the kernel on 64-bit architectures." But I admit I am not an expert on the Linux ABI nor am I an expert on the full scope of KCFI use cases.

Maybe @lvwr can weigh in?

My understanding is that the kernel respects the default calling convention for most things (if not all) and then build on top of it, like by defining indirect call and FineIBT must-use registers (r11 and r10). There are also specifics for syscalls, like using RAX to pass the syscall number.

With the above said, I'm unsure if there is any orthogonal-craziness-custom-thing going on for the handwritten assembly code, but I would assume not given it is desirable to keep standards all around (maybe double check that with PeterZ for assurance?).

Finally, for breadth, the ABI defines code mode kernel for handling symbols and relocations. But I think this is not concerning here.

@lvwr
Copy link
Contributor

lvwr commented Nov 26, 2024

While this overall looks like a nice improvement to me, changing the hashing scheme is going to break compatibility with Rust. I would suggest moving this change behind a command line flag, so the kernel can choose to enable it only if Rust support isn't enabled, or if we have a rustc that also supports this hashing scheme.

@samitolvanen Given this introduces benefits, I wonder if this shouldn't be the default and then we have a flag to disable the enhancement if rust is lagging. I assume Rust will want to catch up with the benefits of this, and when that happen, we need to add one less flag, which is nice 😎

I'm unsure who is working on rust support these days, but if it is @rcvalle, maybe he can speak about the likability of rust eventually incorporating these changes.

edit: looks like rust is already tracking and willing to support it? Rust-for-Linux/linux#1132 (thanks @ojeda)

@Darksonn
Copy link

Rust already supports kCFI and I see no reason it can't also support this scheme. We just need to be careful to introduce it in a good way that reduces the risk of mismatched hashing strategies.

@ojeda
Copy link

ojeda commented Nov 26, 2024

looks like rust is already tracking and willing to support it? Rust-for-Linux/linux#1132 (thanks @ojeda)

You're welcome! Just in case: if the question is about the entry I added today, then that entry is intended to track this PR, the potential consequences/needs for the Rust support in the kernel, related work/links, etc., but it does not imply Rust is tracking it or that they are willing to support it already.

@maurer
Copy link

maurer commented Nov 26, 2024

Flag guarding this feature seems like it would also be good for any existing C users - for example, if trying to build a kernel module intended to load against a kernel image built with an older clang, you need to select the same type ID projection that the kernel did.

@rcvalle
Copy link
Contributor

rcvalle commented Nov 26, 2024

Flag guarding this feature seems like it would also be good for any existing C users - for example, if trying to build a kernel module intended to load against a kernel image built with an older clang, you need to select the same type ID projection that the kernel did.

+1 to this. Adding support for this to the Rust compiler shouldn't be a problem and @maurer or I could take a look at it. However, I wonder if the arity information should be from the high level information (i.e., from the function declarations/definitions) instead of from the lower level calling convention used/code generated (and possibly also affected by/after optimizations). This would ensure compatibility with any calling convention and any other possible differences that may come up later when using cross-language KCFI. (The KCFI encoding is based on high level type information from the function declarations/definitions.)

@scottconstable
Copy link
Contributor Author

Flag guarding this feature seems like it would also be good for any existing C users - for example, if trying to build a kernel module intended to load against a kernel image built with an older clang, you need to select the same type ID projection that the kernel did.

+1 to this. Adding support for this to the Rust compiler shouldn't be a problem and @maurer or I could take a look at it. However, I wonder if the arity information should be from the high level information (i.e., from the function declarations/definitions) instead of from the lower level calling convention used/code generated (and possibly also affected by/after optimizations). This would ensure compatibility with any calling convention and any other possible differences that may come up later when using cross-language KCFI. (The KCFI encoding is based on high level type information from the function declarations/definitions.)

The implementation in this PR does use high-level type information (but now that I look back at what I had written, I think my original PR description was a bit mis-leading). Here is an extended version of the arity table, with an additional column to show what the implementation is intended to do:

Arity Indicator Description Implementation
0 0 parameters 0 parameters
1 1 parameter in RDI 1 address-width (or smaller) parameter and no other parameters
2 2 parameters in RDI and RSI 2 address-width (or smaller) parameters and no other parameters
3 3 parameters in RDI, RSI, and RDX 3 address-width (or smaller) parameters and no other parameters
4 4 parameters in RDI, RSI, RDX, and RCX 4 address-width (or smaller) parameters and no other parameters
5 5 parameters in RDI, RSI, RDX, RCX, and R8 5 address-width (or smaller) parameters and no other parameters
6 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 6 address-width (or smaller) parameters and no other parameters
7 At least one parameter may be passed on the stack The function type does not qualify as arity 0-6

Hence, this implementation uses high-level type information from clang (CodeGenModule.cpp) or LLVM (ModuleUtils.cpp) to infer a better approximation of the default x86-64 calling convention. For example, if the implementation were to instead use the number of parameters as a proxy for the arity, then this could permit a scenario where a register that is dead at the call site becomes live at the call target, e.g.:

// test.c
struct S {
    int *p1;
    int *p2;
};

int foo(struct S s) {  // 1 parameter
    return *s.p1 + *s.p2;
}

But then the struct gets decomposed into two separate registers by the compiler:

clang -O1 test.c -S -o test.S; cat test.S
        .type   foo,@function
foo:                                    # @foo
        .cfi_startproc
# %bb.0:                                # %entry
        movl    (%rsi), %eax
        addl    (%rdi), %eax
        retq

Thus, if an arity-1 function type's hash collides with foo's hash, then a dead RSI register at an arity-1 caller could become live at the target foo.

@rcvalle
Copy link
Contributor

rcvalle commented Nov 26, 2024

Sorry, what I meant was we should be looking only at the number of return (with void being zero, and everything else being one) and parameters from the function declaration/definition and shouldn't be looking at the parameters' type layout (i.e., size), and how and what registers are used to pass them (i.e., we shouldn't mention or reference registers or assembly code at all in the implementation and documentation). The implementation should be completely ABI and architecture agnostic. Does it make sense?

@scottconstable
Copy link
Contributor Author

Sorry, what I meant was we should be looking only at the number of return (with void being zero, and everything else being one) and parameters from the function declaration/definition and shouldn't be looking at the parameters' type layout (i.e., size), and how and what registers are used to pass them (i.e., we shouldn't mention or reference registers or assembly code at all in the implementation and documentation). The implementation should be completely ABI and architecture agnostic. Does it make sense?

I think that the aspiration to be "completely ABI and architecture agnostic" is irreconcilable with the security objective to prevent a register that is dead at a call site from becoming live at a hash-collided call target.

Your point about returns is interesting. AFAIK most x86 Linux kernels (including Ubuntu kernels) are being compiled with -fzero-call-used-regs=used-gpr, which in most cases should prevent a dead return register (RAX) from becoming live at a call site when the call site expects a return value. I can think of one corner case, but I'm not sure how much of a concern it would be in practice. Suppose that int (*)(int) collides with void (*)(int) and this allows a function f of the former type to call a function g of the latter type. Suppose further that RAX dies when f calls g and g does not touch RAX at all, which means that -fzero-call-used-regs=used-gpr will have no effect on RAX in g. Then when g returns to f, the stale value that f had left in RAX will become live and might be used for something malicious.

@rcvalle
Copy link
Contributor

rcvalle commented Nov 27, 2024

This all too architecture specific and isn't the level that LLVM CFI and KCFI originally work. The arity test should be at the language level. E.g.:

void foo(void) is arity zero
void bar(int) is arity 1
int baz(int) is arity 2
int qux(int, int) is arity 3

And so on. If we tried to cover every possible corner case for every ABI or at the architecture level we would end up with a different CFI implementation for every ABI and architecture (and possibly every combination of those).

@scottconstable
Copy link
Contributor Author

This PR is intended to improve security for X86 kernels, as the title suggests. I think that we can and should incorporate feedback from the other participants, several of whom have recommended that this new behavior should be explicitly enabled with a flag, rather than implicitly enabled by examining the target triple. So, for example, I could update the PR to introduce a new flag such as -fsanitize-cfi-icall-experimental-arity-x86. If the community would like to adopt a similar approach that is tuned to benefit Arm64 kernels, then that approach could be enabled with a different flag.

Flag-based enabling would also help to address the concerns about compiler compatibility. For instance, if an x86 kernel is being built with both C and Rust code, then Kconfig could check whether the local clang and rustc compilers support the KCFI arity enhancement; if one or the other does not support KCFI arity then Kconfig would fall back to the default 32-bit hash.

@rcvalle I don't see any evidence that this approach will require a plethora of implementations for different arch/ABI combinations. As far as I am aware, the vast majority of x86-64 Linux kernel code (including eBPF code) uses the standard ABI. Even if some kernel code does not conform to the standard ABI, this should not break compatibility with the arity enhancement because the arity enhancement's implementation derives the arity tag from the function type, not from the ABI. Hence, a valid function call that uses a non-standard ABI would still experience a matching KCFI type ID at both the call site and the call target. The downside is that call sites/targets with a non-standard ABI might be tagged with an inaccurate arity, which could matter if the function type's 29-bit hash collides with another function type's 29-bit hash, and that other function's arity tag is the same but the two functions' actual arities differ. IMO this seems very unlikely.

@maurer
Copy link

maurer commented Nov 27, 2024

This is not a Rust concern, but re-reading the initial post, it looks like your own statistics suggest that consuming 3 bits for arity costs more than it buys you. As stated, (didn't check your math, just going off what you said) prior to your change, we expect 0.01383765 collisions in your sample environment. After your change, we expect to have the sum of your right hand column in collisions, which comes out to 0.0266774 - nearly double the rate of collisions we have with the basic implementation. In fact, I think that any scheme like this will always going to increase the number of overall collisions, given that the arity is implicitly hashed into the representation already.

The main reason I could see to consider this is if for some reason a cross-arity collision is more dangerous than a same-arity collision in terms of exploitability, which I can't immediately argue, but perhaps you have something for this that was just assumed in the initial post?

@scottconstable
Copy link
Contributor Author

scottconstable commented Nov 27, 2024

This is not a Rust concern, but re-reading the initial post, it looks like your own statistics suggest that consuming 3 bits for arity costs more than it buys you. As stated, (didn't check your math, just going off what you said) prior to your change, we expect 0.01383765 collisions in your sample environment.

If you had checked my math, you would have noticed that the values in the right-hand column were not exactly correct! I found and fixed a bug in my Excel code and updated that chart, but the numbers didn't change by much.

After your change, we expect to have the sum of your right hand column in collisions, which comes out to 0.0266774 - nearly double the rate of collisions we have with the basic implementation. In fact, I think that any scheme like this will always going to increase the number of overall collisions, given that the arity is implicitly hashed into the representation already.

Your analysis is correct that the total number of expected collisions would increase from 0.01375771 to 0.02660650 (with the updated statistics).

The main reason I could see to consider this is if for some reason a cross-arity collision is more dangerous than a same-arity collision in terms of exploitability, which I can't immediately argue, but perhaps you have something for this that was just assumed in the initial post?

I think that a cross-arity collision is clearly more dangerous than a same-arity collision. This is what I had written in the PR description:

The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target.

Including the arity in the hash does prevent or even reduce the likelihood that a cross-arity collision will occur, if the hash function is uniform. Suppose that two function types' hashes collide (under the current 32-bit hash scheme). The probability that the arities differ is equal to 1 minus the probability that the arities agree:

1 - ((32/10903)(31/10902) + (2492/10903)(2491/10902) + ... + (148/10903)*(147/10902)) = 0.75901

Thus, if there is a hash collision, there is roughly a 76% chance that it will be a cross-arity collision.

@maurer
Copy link

maurer commented Nov 30, 2024

I'm not sure I buy the argument that cross-arity functions are significantly more exploitable than same-arity mismatches.

Restating your argument to make sure I've understood it correctly: When an attacker swaps in a function pointer of higher arity, a dead variable (or even a live one expected to be restored afterwards), may be used improperly by the function. The attacker is more likely to be able to control this than other parameters. (This last part is the piece that I don't see the justification for.)

However, this argument applies to every argument consumed by a function - each is as likely to be attacker controlled as the contents of a dead register, and if the function pointer thinks it's getting a value of a different type (e.g. believes it's getting a pointer, gets an attacker controlled flags value). This seems to be an argument that higher arity functions are more likely to be exploitable because it's more likely that the attacker controls at least one of their argument.

Additionally, the increased likelihood of exploitability of a mismatch needs to be quite high to justify what you're describing. Let p be the probability that a type mismatch with matching arity is exploitable, and q be the probability that a type mismatch with an arity change is exploitable. Let X be the base collision rate. Then the old scheme has an exploitability rate of: 0.01375771 * (p * 0.25 + q * 0.75) and the new one has a rate of 0.02660650 * p. Setting equal and reducing to find the breakeven point,

0.022575 p = 0.010275 q
2.197 p = q

So if you want to claim this is to increase the strength of the scheme, you need to argue that an attacker is more than twice as likely to be able to exploit an arity mismatched function as a type mismatched one, which seems like a bit of a stretch - attackers often have direct control of intentionally passed arguments...

@scottconstable
Copy link
Contributor Author

Hi @maurer, I can honestly say that I appreciate your attention to detail. While your observation that "attackers often have direct control of intentionally passed arguments" is true in general, I do not see evidence of this among indirect calls in the Linux kernel. 77% of all indirect call parameters are pointers, and AFAIK user-mode code cannot exert arbitrary control over pointers within the kernel (especially given that x86 Linux kernels enable SMAP and SMEP when it is available on the CPU). That leaves the other 23% of indirect call parameters to consider, though I admit I don't have a concrete breakdown of what fraction of these might be attacker controllable.

But let's assume for the sake of discussion that all of the non-pointer indirect call parameters are attacker controllable. And because we don't know which dead registers at a call site might be attacker-controllable, let's similarly assume for the sake of discussion that all of these dead registers are attacker-controllable. I built the table below to estimate for the arity-enhanced scheme a kind of exploitability score, i.e., the expected number of attacker-controlled parameters times the frequency of function types of that arity. For example, suppose that a hash collision between two function types occurs within the arity-1 partition. The expected number of attacker-controlled parameters at the first function is 0.23, and likewise for the second. So the total number of attacker-controlled parameters is 0.46, times the relative frequency of arity-1 function types (roughly 26%) gives a weighted exploitability of 0.105138035.

Arity Expected # of attacker-controlled args Weighted exploitability
0 0 0
1 0.46 0.105138035
2 0.92 0.318536183
3 1.38 0.322375493
4 1.84 0.197281482
5 2.3 0.109483628
6+ 2.76 0.093409153

Total exploitability score for the arity enhancement: 1.146223975

A similar analysis can be applied to the existing 32-bit hash scheme. For example, suppose a 1-arity function collides with a 2-arity function. Then the expected number of attacker-controlled parameters that would be live at the mis-matched target would be 0.23 if the 2-arity function calls the 1-arity function, or 1.23 if the 1-arity function calls the 2-arity function (because the second parameter at the target would be a dead register at the caller). This data is captured in the table below, where the columns and rows refer to the respective arities of the colliding function types.

0 1 2 3 4 5 6+
0 0
1 1 0.46
2 2 1.46 0.92
3 3 2.46 1.92 1.38
4 4 3.46 2.92 2.38 1.84
5 5 4.46 3.92 3.38 2.84 2.3
6+ 6 5.46 4.92 4.38 3.84 3.3 2.76

Assuming that a collision occurs, this table shows the probability density of the arities (note that the probabilities do sum to 1):

0 1 2 3 4 5 6+
0 8.61406E-06
1 0.00134164 0.052240106
2 0.00203238 0.15827159 0.119878662
3 0.001371251 0.106786156 0.161764743 0.054571497
4 0.000629365 0.049011785 0.074245381 0.050093506 0.011495742
5 0.000279419 0.021759723 0.032962663 0.022239974 0.010207511 0.00226591
6+ 0.000198662 0.015470786 0.023435882 0.015812236 0.007257363 0.003222046 0.001145409

Then the weighted exploitability for each collision is:

0 1 2 3 4 5 6+
0 0 0 0 0 0 0 0
1 0.00134164 0.024030449 0 0 0 0 0
2 0.00406476 0.231076521 0.110288369 0 0 0 0
3 0.004113752 0.262693944 0.310588307 0.075308666 0 0 0
4 0.002517459 0.169580776 0.216796512 0.119222544 0.021152165 0 0
5 0.001397093 0.097048366 0.129213637 0.075171112 0.02898933 0.005211593 0
6+ 0.001191971 0.084470491 0.115304537 0.069257593 0.027868274 0.010632751 0.00316133

Total exploitability score for the existing 32-bit hash: 2.201693943

So, if you accept my reasoning, then a collision in the 32-bit scheme is expected to be roughly twice as exploitable as a collision in the 29-bit scheme with 3 arity bits. On the other hand, the 29-bit scheme might produce twice as many collisions.

@scottconstable
Copy link
Contributor Author

I also do not want to lose sight of one of the other obvious advantages that was mentioned in the PR description:

One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer.

To elaborate, we are concurrently working on a Linux kernel patch to enhance FineIBT (which is a KCFI-like solution that utilizes x86 Indirect Branch Tracking). The goal is to extend FineIBT to poison live argument registers if a hash check fails after a branch mis-prediction. This enhancement can help to mitigate a variety of Spectre attacks in the Linux kernel.

@rcvalle
Copy link
Contributor

rcvalle commented Dec 4, 2024

I'm still trying to understand why a collision with a function of different arity is riskier than a collision with a function of the same arity, as there are so many factors that can account for it, such as:

  • What registers that attacker has control of.
  • How the registers that the attacker has control of are used by the other functions.
  • What were the primitives the attacker had before to be able to try to change/redirect the control flow compared to the new primitives that the attacker might gain depending on how the registers that the attacker has control of are used by the other functions (i.e., is the attacker going to gain more powerful primitives?)

And so on. Given that these depend on the context of the vulnerability being exploited, how can it be asserted that a collision with a function of different arity is riskier than a collision with a function of the same arity since the function arity doesn't imply or influence any of the above (except the fact that the callee may have less or more registers it possibly uses, but it's unknown if the attacker controls any of them, how they're used by the callee, and if any of them are used, if it's in a way that gives the attacker a more powerful primitive than they already have).

@maurer
Copy link

maurer commented Dec 4, 2024

I think Scott's second point may be the relevant one - this may be similar strength or slightly weaker, but having an indicator stating which registers are live is potentially needed to allow FineIBT to poison them during speculative execution.

Given how much padding is used in the X86 kernel around function headers, have you considered just stealing another byte to encode the data for your arity stuff, and considering it to be a separate mitigation from KCFI tags?

The rest of the CFI type ID scheme is arch-independent, but in order to know which registers are in use, you need arch dependent information, because you care about the calling convention, packing rules, etc. This is part of why Ramon thought your design choices were odd above - this isn't really a CFI enhancement or modification, this is another piece of information you need for speculation defenses that occur at a different abstraction level.

@MaskRay
Copy link
Member

MaskRay commented Dec 4, 2024

If you are changing the hash scheme, please migrate from xxHash64 to xxh3. We want to remove the legacy xxHash64 from LLVM.

@scottconstable
Copy link
Contributor Author

I think Scott's second point may be the relevant one - this may be similar strength or slightly weaker, but having an indicator stating which registers are live is potentially needed to allow FineIBT to poison them during speculative execution.

Given how much padding is used in the X86 kernel around function headers, have you considered just stealing another byte to encode the data for your arity stuff, and considering it to be a separate mitigation from KCFI tags?

Yes, and this is what our first prototype does. There is a brief writeup here: https://lkml.org/lkml/2024/9/27/982.

When I started to refine the prototype into a PR, I noticed that stealing another byte in the padding region causes problems for either LLVM or for Linux. This is because LLVM emits function headers like this:

NOP
NOP
...
MOV $01234567, %eax
<foo>:

But when the Linux image is loaded, it moves the type information to the beginning of the padding, to allow the pre-function padding to be used for hot patching:

MOV $01234567, %eax
...
NOP
NOP
<foo>:

Thus, when new metadata is added to the padding region, it introduces new challenges for either LLVM or for Linux:

  • If the metadata is added before the hash, then Linux needs to know that it might need to look for the hash in a different location, i.e., in a different offset relative to the start of the function.
  • If the metadata is added after the hash, then LLVM's KCFI code generator needs to correctly adjust the offsets for the call site KCFI checks, depending on whether the new metadata is being generated.

The discussions on LKML also led us to simplify the proposal, which originally had called for an 8-bit field to denote which registers are pointers, so that only pointers would be poisoned after a misprediction. The new proposal poisons all live registers and therefore requires a 3-bit field to record the arity (defined as the number of live registers). It occurred to me that arity might also be useful to harden the hash checks for KCFI, but that may require a consensus among KCFI stakeholders.

One other alternative could be to use the 3-bit reg field in the MOV to encode the arity. For example:

MOV $01234567, %eax    # arity 0
MOV $01234567, %ecx    # arity 1
MOV $01234567, %edx    # arity 2
MOV $01234567, %ebx    # arity 3
MOV $01234567, %esp    # arity 4
MOV $01234567, %ebp    # arity 5
MOV $01234567, %esi    # arity 6
MOV $01234567, %edi    # arity 7+

I believe that this alternative implementation would also meet the requirements for the FineIBT register poisoning enhancement.

The rest of the CFI type ID scheme is arch-independent, but in order to know which registers are in use, you need arch dependent information, because you care about the calling convention, packing rules, etc. This is part of why Ramon thought your design choices were odd above - this isn't really a CFI enhancement or modification, this is another piece of information you need for speculation defenses that occur at a different abstraction level.

I agree with your interpretation. And thank you and @rcvalle again for the ongoing feedback and discussion.

@scottconstable
Copy link
Contributor Author

If you are changing the hash scheme, please migrate from xxHash64 to xxh3. We want to remove the legacy xxHash64 from LLVM.

It seems from the discussion above that this change would require a new flag. Suppose, for instance, that a user builds a kernel with a version of Clang that uses xxh3, and a version of rustc that still uses xxHash64. Then the hashes could be incompatible and the kernel will crash.

This PR is specific to X86-64 targets. Your suggestion seems like a general enhancement that should apply to all targets; I think it should be a separate PR.

@scottconstable
Copy link
Contributor Author

@rcvalle @maurer Do you have any feedback on the alternative proposal, #117121 (comment)?

@rcvalle
Copy link
Contributor

rcvalle commented Dec 11, 2024

I haven't received a reply for my concerns I commented at #117121 (comment) and #117121 (comment), and they still remain for the alternative proposal.

I'd recommend working on better defining the problem, and then working on a well-defined solution for the problem that doesn't affect or conflict with a currently well-defined solution to a different problem (i.e., KCFI).

@scottconstable
Copy link
Contributor Author

@phoebewang @sirmc @samitolvanen @Darksonn @lvwr @ojeda @maurer @rcvalle @MaskRay

I have created #121070 to implement the alternate proposal summarized in this comment above: #117121 (comment).

@scottconstable
Copy link
Contributor Author

I haven't received a reply for my concerns I commented at #117121 (comment) and #117121 (comment), and they still remain for the alternative proposal.

I'd recommend working on better defining the problem, and then working on a well-defined solution for the problem that doesn't affect or conflict with a currently well-defined solution to a different problem (i.e., KCFI).

@rcvalle I think that #117121 (comment) isn't relevant for the alternative proposal in #121070 because that proposal does not attempt to prevent cross-arity collisions. Do you agree? Or have I mis-understood your concern?

I'm not sure whether #117121 (comment) is relevant for the alternative proposal. Please feel free to discuss at #121070.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang Clang issues not falling into any other category llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.