[msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl/tbx #114490

thurstond · 2024-10-31T23:25:13Z

This adds a general function that handles intrinsics by applying the intrinsic to the shadows, and applies it to the specific case of Arm NEON TBL/TBX intrinsics.

This also updates the tests from #114462

llvmbot · 2024-10-31T23:25:45Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Thurston Dang (thurstond)

Changes

…rinsics

This adds a general function that handles intrinsics by applying the intrinsic to the shadows, and applies it to the specific case of Arm NEON TBL intrinsics.

This also updates the tests from #114462

Patch is 168.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114490.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp (+43)
(added) llvm/test/Instrumentation/MemorySanitizer/AArch64/neon_tbl.ll (+784)
(added) llvm/test/Instrumentation/MemorySanitizer/AArch64/neon_tbl_origins.ll (+1222)

diff --git a/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp b/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
index 391fb30d95e2ae..cb91376b48792f 100644
--- a/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
@@ -3944,6 +3944,30 @@ struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
     }
   }
 
+  /// Handle intrinsics by applying the intrinsic to the shadows.
+  /// The origin is approximated using setOriginForNaryOp.
+  ///
+  /// For example, this can be applied to the Arm NEON vector table intrinsics
+  /// (tbl{1,2,3,4}).
+  void handleIntrinsicByApplyingToShadow(IntrinsicInst &I, unsigned int numArgOperands) {
+    IRBuilder<> IRB(&I);
+
+    // Don't use getNumOperands() because it includes the callee
+    assert (numArgOperands == I.arg_size());
+
+    SmallVector<Value *, 8> ShadowArgs;
+    for (unsigned int i = 0; i < numArgOperands; i++) {
+      Value *Shadow = getShadow(&I, i);
+      ShadowArgs.append(1, Shadow);
+    }
+
+    CallInst *CI =
+        IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
+    setShadow(&I, CI);
+
+    setOriginForNaryOp(I);
+  }
+
   void visitIntrinsicInst(IntrinsicInst &I) {
     switch (I.getIntrinsicID()) {
     case Intrinsic::uadd_with_overflow:
@@ -4319,6 +4343,25 @@ struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
       break;
     }
 
+    // Arm NEON vector table intrinsics have the source/table register(s),
+    // followed by the index register. They return the output.
+    case Intrinsic::aarch64_neon_tbl1: {
+      handleIntrinsicByApplyingToShadow(I, 2);
+      break;
+    }
+    case Intrinsic::aarch64_neon_tbl2: {
+      handleIntrinsicByApplyingToShadow(I, 3);
+      break;
+    }
+    case Intrinsic::aarch64_neon_tbl3: {
+      handleIntrinsicByApplyingToShadow(I, 4);
+      break;
+    }
+    case Intrinsic::aarch64_neon_tbl4: {
+      handleIntrinsicByApplyingToShadow(I, 5);
+      break;
+    }
+
     default:
       if (!handleUnknownIntrinsic(I))
         visitInstruction(I);
diff --git a/llvm/test/Instrumentation/MemorySanitizer/AArch64/neon_tbl.ll b/llvm/test/Instrumentation/MemorySanitizer/AArch64/neon_tbl.ll
new file mode 100644
index 00000000000000..6a99c1546acb0a
--- /dev/null
+++ b/llvm/test/Instrumentation/MemorySanitizer/AArch64/neon_tbl.ll
@@ -0,0 +1,784 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --tool build/bin/opt --version 2
+; Test memory sanitizer instrumentation for Arm NEON tbl instructions.
+;
+; RUN: opt < %s -passes=msan -S | FileCheck %s
+;
+; Forked from llvm/test/CodeGen/AArch64/arm64-tbl.ll
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64--linux-android9001"
+
+; -----------------------------------------------------------------------------------------------------------------------------------------------
+
+define <8 x i8> @tbl1_8b(<16 x i8> %A, <8 x i8> %B) nounwind sanitize_memory {
+; CHECK-LABEL: define <8 x i8> @tbl1_8b
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <8 x i8> [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <8 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP3:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[TMP1]], <8 x i8> [[TMP2]])
+; CHECK-NEXT:    [[OUT:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> [[A]], <8 x i8> [[B]])
+; CHECK-NEXT:    store <8 x i8> [[TMP3]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <8 x i8> [[OUT]]
+;
+  %out = call <8 x i8> @llvm.aarch64.neon.tbl1.v8i8(<16 x i8> %A, <8 x i8> %B)
+  ret <8 x i8> %out
+}
+
+define <16 x i8> @tbl1_16b(<16 x i8> %A, <16 x i8> %B) nounwind sanitize_memory {
+; CHECK-LABEL: define <16 x i8> @tbl1_16b
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP3:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
+; CHECK-NEXT:    [[OUT:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> [[A]], <16 x i8> [[B]])
+; CHECK-NEXT:    store <16 x i8> [[TMP3]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <16 x i8> [[OUT]]
+;
+  %out = call <16 x i8> @llvm.aarch64.neon.tbl1.v16i8(<16 x i8> %A, <16 x i8> %B)
+  ret <16 x i8> %out
+}
+
+define <8 x i8> @tbl2_8b(<16 x i8> %A, <16 x i8> %B, <8 x i8> %C) sanitize_memory {
+; CHECK-LABEL: define <8 x i8> @tbl2_8b
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <8 x i8> [[C:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <8 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP4:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> [[TMP3]])
+; CHECK-NEXT:    [[OUT:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[A]], <16 x i8> [[B]], <8 x i8> [[C]])
+; CHECK-NEXT:    store <8 x i8> [[TMP4]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <8 x i8> [[OUT]]
+;
+  %out = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> %A, <16 x i8> %B, <8 x i8> %C)
+  ret <8 x i8> %out
+}
+
+define <16 x i8> @tbl2_16b(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C) sanitize_memory {
+; CHECK-LABEL: define <16 x i8> @tbl2_16b
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i8> [[C:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP4:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]])
+; CHECK-NEXT:    [[OUT:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[A]], <16 x i8> [[B]], <16 x i8> [[C]])
+; CHECK-NEXT:    store <16 x i8> [[TMP4]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <16 x i8> [[OUT]]
+;
+  %out = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C)
+  ret <16 x i8> %out
+}
+
+define <8 x i8> @tbl3_8b(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C, <8 x i8> %D) sanitize_memory {
+; CHECK-LABEL: define <8 x i8> @tbl3_8b
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i8> [[C:%.*]], <8 x i8> [[D:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = load <8 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 48) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP5:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <8 x i8> [[TMP4]])
+; CHECK-NEXT:    [[OUT:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> [[A]], <16 x i8> [[B]], <16 x i8> [[C]], <8 x i8> [[D]])
+; CHECK-NEXT:    store <8 x i8> [[TMP5]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <8 x i8> [[OUT]]
+;
+  %out = call <8 x i8> @llvm.aarch64.neon.tbl3.v8i8(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C, <8 x i8> %D)
+  ret <8 x i8> %out
+}
+
+define <16 x i8> @tbl3_16b(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C, <16 x i8> %D) sanitize_memory {
+; CHECK-LABEL: define <16 x i8> @tbl3_16b
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i8> [[C:%.*]], <16 x i8> [[D:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 48) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP5:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]])
+; CHECK-NEXT:    [[OUT:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> [[A]], <16 x i8> [[B]], <16 x i8> [[C]], <16 x i8> [[D]])
+; CHECK-NEXT:    store <16 x i8> [[TMP5]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <16 x i8> [[OUT]]
+;
+  %out = call <16 x i8> @llvm.aarch64.neon.tbl3.v16i8(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C, <16 x i8> %D)
+  ret <16 x i8> %out
+}
+
+define <8 x i8> @tbl4_8b(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C, <16 x i8> %D, <8 x i8> %E) sanitize_memory {
+; CHECK-LABEL: define <8 x i8> @tbl4_8b
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i8> [[C:%.*]], <16 x i8> [[D:%.*]], <8 x i8> [[E:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 48) to ptr), align 8
+; CHECK-NEXT:    [[TMP5:%.*]] = load <8 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 64) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP6:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> [[TMP5]])
+; CHECK-NEXT:    [[OUT:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> [[A]], <16 x i8> [[B]], <16 x i8> [[C]], <16 x i8> [[D]], <8 x i8> [[E]])
+; CHECK-NEXT:    store <8 x i8> [[TMP6]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <8 x i8> [[OUT]]
+;
+  %out = call <8 x i8> @llvm.aarch64.neon.tbl4.v8i8(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C, <16 x i8> %D, <8 x i8> %E)
+  ret <8 x i8> %out
+}
+
+define <16 x i8> @tbl4_16b(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C, <16 x i8> %D, <16 x i8> %E) sanitize_memory {
+; CHECK-LABEL: define <16 x i8> @tbl4_16b
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i8> [[C:%.*]], <16 x i8> [[D:%.*]], <16 x i8> [[E:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 48) to ptr), align 8
+; CHECK-NEXT:    [[TMP5:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 64) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP6:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
+; CHECK-NEXT:    [[OUT:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> [[A]], <16 x i8> [[B]], <16 x i8> [[C]], <16 x i8> [[D]], <16 x i8> [[E]])
+; CHECK-NEXT:    store <16 x i8> [[TMP6]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <16 x i8> [[OUT]]
+;
+  %out = call <16 x i8> @llvm.aarch64.neon.tbl4.v16i8(<16 x i8> %A, <16 x i8> %B, <16 x i8> %C, <16 x i8> %D, <16 x i8> %E)
+  ret <16 x i8> %out
+}
+
+
+
+define <8 x i8> @shuffled_tbl2_to_tbl4_v8i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c, <16 x i8> %d) sanitize_memory {
+; CHECK-LABEL: define <8 x i8> @shuffled_tbl2_to_tbl4_v8i8
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i8> [[C:%.*]], <16 x i8> [[D:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 48) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP5:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <8 x i8> zeroinitializer)
+; CHECK-NEXT:    [[T1:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[A]], <16 x i8> [[B]], <8 x i8> <i8 0, i8 4, i8 8, i8 12, i8 -1, i8 -1, i8 -1, i8 -1>)
+; CHECK-NEXT:    [[TMP6:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <8 x i8> zeroinitializer)
+; CHECK-NEXT:    [[T2:%.*]] = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> [[C]], <16 x i8> [[D]], <8 x i8> <i8 0, i8 4, i8 8, i8 12, i8 -1, i8 -1, i8 -1, i8 -1>)
+; CHECK-NEXT:    [[_MSPROP:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[S:%.*]] = shufflevector <8 x i8> [[T1]], <8 x i8> [[T2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    store <8 x i8> [[_MSPROP]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <8 x i8> [[S]]
+;
+  %t1 = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> %a, <16 x i8> %b, <8 x i8> <i8 0, i8 4, i8 8, i8 12, i8 -1, i8 -1, i8 -1, i8 -1>)
+  %t2 = call <8 x i8> @llvm.aarch64.neon.tbl2.v8i8(<16 x i8> %c, <16 x i8> %d, <8 x i8> <i8 0, i8 4, i8 8, i8 12, i8 -1, i8 -1, i8 -1, i8 -1>)
+  %s = shufflevector <8 x i8> %t1, <8 x i8> %t2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
+  ret <8 x i8> %s
+}
+
+
+
+define <16 x i8> @shuffled_tbl2_to_tbl4(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c, <16 x i8> %d) sanitize_memory {
+; CHECK-LABEL: define <16 x i8> @shuffled_tbl2_to_tbl4
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i8> [[C:%.*]], <16 x i8> [[D:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 48) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[TMP5:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], <16 x i8> zeroinitializer)
+; CHECK-NEXT:    [[T1:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[A]], <16 x i8> [[B]], <16 x i8> <i8 0, i8 4, i8 8, i8 12, i8 16, i8 20, i8 24, i8 28, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>)
+; CHECK-NEXT:    [[TMP6:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP4]], <16 x i8> zeroinitializer)
+; CHECK-NEXT:    [[T2:%.*]] = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> [[C]], <16 x i8> [[D]], <16 x i8> <i8 0, i8 4, i8 8, i8 12, i8 16, i8 20, i8 24, i8 28, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>)
+; CHECK-NEXT:    [[_MSPROP:%.*]] = shufflevector <16 x i8> [[TMP5]], <16 x i8> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
+; CHECK-NEXT:    [[S:%.*]] = shufflevector <16 x i8> [[T1]], <16 x i8> [[T2]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
+; CHECK-NEXT:    store <16 x i8> [[_MSPROP]], ptr @__msan_retval_tls, align 8
+; CHECK-NEXT:    ret <16 x i8> [[S]]
+;
+  %t1 = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> <i8 0, i8 4, i8 8, i8 12, i8 16, i8 20, i8 24, i8 28, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>)
+  %t2 = call <16 x i8> @llvm.aarch64.neon.tbl2.v16i8(<16 x i8> %c, <16 x i8> %d, <16 x i8> <i8 0, i8 4, i8 8, i8 12, i8 16, i8 20, i8 24, i8 28, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>)
+  %s = shufflevector <16 x i8> %t1, <16 x i8> %t2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
+  ret <16 x i8> %s
+}
+
+
+define <16 x i8> @shuffled_tbl2_to_tbl4_nonconst_first_mask(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c, <16 x i8> %d, i8 %v) sanitize_memory {
+; CHECK-LABEL: define <16 x i8> @shuffled_tbl2_to_tbl4_nonconst_first_mask
+; CHECK-SAME: (<16 x i8> [[A:%.*]], <16 x i8> [[B:%.*]], <16 x i8> [[C:%.*]], <16 x i8> [[D:%.*]], i8 [[V:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 64) to ptr), align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = load <16 x i8>, ptr @__msan_param_tls, align 8
+; CHECK-NEXT:    [[TMP3:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 16) to ptr), align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 32) to ptr), align 8
+; CHECK-NEXT:    [[TMP5:%.*]] = load <16 x i8>, ptr inttoptr (i64 add (i64 ptrtoint (ptr @__msan_param_tls to i64), i64 48) to ptr), align 8
+; CHECK-NEXT:    call void @llvm.donothing()
+; CHECK-NEXT:    [[_MSPROP:%.*]] = insertelement <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, i8 [[TMP1]], i32 0
+; CHECK-NEXT:    [[INS_0:%.*]] = insertelement <16 x i8> poison, i8 [[V]], i32 0
+; CHECK-NEXT:    [[_MSPROP1:%.*]] = insertelement <16 x i8> [[_MSPROP]], i8 [[TMP1]], i32 1
+; CHECK-NEXT:    [[INS_1:%.*]] = insertelement <16 x i8> [[INS_0]], i8 [[V]], i32 1
+; CHECK-NEXT:    [[_MSPROP2:%.*]] = insertelement <16 x i8> [[_MSPROP1]], i8 [[TMP1]], i32 2
+; CHECK-NEXT:    [[INS_2:%.*]] = insertelement <16 x i8> [[INS_1]], i8 [[V]], i32 2
+; CHECK-NEXT:    [[_MSPROP3:%.*]] = insertelement <16 x i8> [[_MSPROP2]], i8 [[TMP1]], i32 3
+; CHECK-NEXT:    [[INS_3:%.*]] = insertelement <16 x i8> [[INS_2]], i8 [[V]], i32 3
+; CHECK-NEXT:    [[_MSPROP4:%.*]] = insertelement <16 x i8> [[_MSPROP3]], i8 [[TMP1]], i32 4
+; CHECK-NEXT:    [[INS_4:%.*]] = insertelement <16 x i8> [[INS_3]], i8 [[V]], i32 4
+; CHECK-NEXT:    [[_MSPROP5:%.*]] = insertelement <16 x i8> [[_MSPROP4]], i8 [[TMP1]], i32 5
+; CHECK-NEXT:    [[INS_5:%.*]] = insertelement <16 x i8> [[INS_4]], i8 [[V]], i32 5
+; CHECK-NEXT:    [[_MSPROP6:%.*]] = insertelement <16 x i8> [[_MSPROP5]], i8 [[TMP1]], i32 6
+; CHECK-NEXT:    [[INS_6:%.*]] = insertelement <16 x i8> [[INS_5]], i8 [[V]], ...
[truncated]

github-actions · 2024-10-31T23:28:25Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp

llvm/test/Instrumentation/MemorySanitizer/AArch64/neon_tbl.ll

…rinsics This adds a general function that handles intrinsics by applying the intrinsic to the shadows, and applies it to the specific case of Arm NEON TBL intrinsics. This also updates the tests from llvm#114462

Add support for tbx Remove origins test

Add support for tbx

thurstond · 2024-11-01T00:13:11Z

I realized the index argument shouldn't be converted into a shadow. I'll update this patch tomorrow.

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp

…lvm#114490) This adds a general function that handles intrinsics by applying the intrinsic to the shadows, and applies it to the specific case of Arm NEON TBL/TBX intrinsics. This also updates the tests from llvm#114462

handleIntrinsicByApplyingToShadow (introduced in llvm#114490) requires that the intrinsic supports integer-ish operands; this is not the case for all intrinsics. This patch generalizes the function to bitcast the shadow arguments to be the same type as the original intrinsic, thus guaranteeing that the intrinsic exists. Additionally, it casts the computed shadow to be an appropriate shadow type. This function assumes that the intrinsic will handle arbitrary bit-patterns (for example, if the intrinsic only accepts floats for var1, we require that it doesn't care if inputs are NaNs).

…ing (#123474) `handleIntrinsicByApplyingToShadow` (introduced in #114490) requires that the intrinsic supports integer-ish operands; this is not the case for all intrinsics. This patch generalizes the function to bitcast the shadow arguments to be the same type as the original intrinsic, thus guaranteeing that the intrinsic exists. Additionally, it casts the computed shadow to be an appropriate shadow type. This function assumes that the intrinsic will handle arbitrary bit-patterns (for example, if the intrinsic accepts floats for var1, we assume that it works normally even if inputs are NaNs etc.).

…ing bitcasting (#123474) `handleIntrinsicByApplyingToShadow` (introduced in llvm/llvm-project#114490) requires that the intrinsic supports integer-ish operands; this is not the case for all intrinsics. This patch generalizes the function to bitcast the shadow arguments to be the same type as the original intrinsic, thus guaranteeing that the intrinsic exists. Additionally, it casts the computed shadow to be an appropriate shadow type. This function assumes that the intrinsic will handle arbitrary bit-patterns (for example, if the intrinsic accepts floats for var1, we assume that it works normally even if inputs are NaNs etc.).

…#124159) Horizontal add (hadd) and subtract (hsub) are currently heuristically handled by `maybeHandleSimpleNomemIntrinsic()` (via `handleUnknownIntrinsic()`), which computes the shadow by bitwise OR'ing the two operands. This has false positives for hadd/hsub shadows. For example, suppose the shadows for the two operands are 00000000 and 11111111 respectively. The expected shadow for the result is 00001111, but `maybeHandleSimpleNomemIntrinsic` would compute it as 11111111. This patch handles horizontal add using `handleIntrinsicByApplyingToShadow` (from #114490), which has no false positives for hadd/hsub: if each pair of adjacent shadow values is zero (fully initialized), the result will be zero (fully initialized). More generally, it is precise for hadd/hsub if at least one of the two adjacent shadow values in each pair is zero. It does have some false negatives for hadd/hsub: if we add/subtract two adjacent non-zero shadow values, some bits of the result may incorrectly be zero. We consider this an acceptable tradeoff for performance. To make shadow propagation precise, we want the equivalent of "horizontal OR", but this is not available. Reducing horizontal OR to (permutation plus bitwise OR) is left as an exercise for the reader.

…g to shadow (#124159) Horizontal add (hadd) and subtract (hsub) are currently heuristically handled by `maybeHandleSimpleNomemIntrinsic()` (via `handleUnknownIntrinsic()`), which computes the shadow by bitwise OR'ing the two operands. This has false positives for hadd/hsub shadows. For example, suppose the shadows for the two operands are 00000000 and 11111111 respectively. The expected shadow for the result is 00001111, but `maybeHandleSimpleNomemIntrinsic` would compute it as 11111111. This patch handles horizontal add using `handleIntrinsicByApplyingToShadow` (from llvm/llvm-project#114490), which has no false positives for hadd/hsub: if each pair of adjacent shadow values is zero (fully initialized), the result will be zero (fully initialized). More generally, it is precise for hadd/hsub if at least one of the two adjacent shadow values in each pair is zero. It does have some false negatives for hadd/hsub: if we add/subtract two adjacent non-zero shadow values, some bits of the result may incorrectly be zero. We consider this an acceptable tradeoff for performance. To make shadow propagation precise, we want the equivalent of "horizontal OR", but this is not available. Reducing horizontal OR to (permutation plus bitwise OR) is left as an exercise for the reader.

llvm.bitreverse was incorrectly handled by the heuristic handler, because it did not reverse the bits of the shadow. This updates the instrumentation to use the handler from llvm#114490 and updates the test from llvm#125592

llvm.bitreverse was incorrectly handled by the heuristic handler, because it did not reverse the bits of the shadow. This updates the instrumentation to use the handler from #114490 and updates the test from #125592

…dow (#125606) llvm.bitreverse was incorrectly handled by the heuristic handler, because it did not reverse the bits of the shadow. This updates the instrumentation to use the handler from llvm/llvm-project#114490 and updates the test from llvm/llvm-project#125592

…25606) llvm.bitreverse was incorrectly handled by the heuristic handler, because it did not reverse the bits of the shadow. This updates the instrumentation to use the handler from llvm#114490 and updates the test from llvm#125592

llvmbot added compiler-rt:sanitizer llvm:transforms labels Oct 31, 2024

thurstond requested review from vitalybuka and kstoimenov October 31, 2024 23:25

thurstond changed the title ~~[msan] Add handleIntrinsicByApplyingToShadow and support NEON tbl int…~~ [msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl Oct 31, 2024

fmayer reviewed Oct 31, 2024

View reviewed changes

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp Outdated Show resolved Hide resolved

vitalybuka reviewed Oct 31, 2024

View reviewed changes

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp Outdated Show resolved Hide resolved

llvm/test/Instrumentation/MemorySanitizer/AArch64/neon_tbl.ll Outdated Show resolved Hide resolved

thurstond changed the title ~~[msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl~~ [msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl/tbx Oct 31, 2024

thurstond added 3 commits October 31, 2024 23:56

Simplify handleIntrinsicByApplyingToShadow per reviewers' comments

adb3d98

Add support for tbx Remove origins test

Simplify handleIntrinsicByApplyingToShadow per reviewers' comments

ce7b1e0

Add support for tbx

thurstond force-pushed the msan_neon_tbl_implement branch from 6b0ec21 to ce7b1e0 Compare October 31, 2024 23:57

thurstond requested review from vitalybuka and fmayer October 31, 2024 23:58

Fix typo

e39097a

thurstond marked this pull request as draft November 1, 2024 00:12

Don't compute the shadow of the trailing argument to TBL/TBX

7f302b7

thurstond marked this pull request as ready for review November 1, 2024 18:13

vitalybuka reviewed Nov 1, 2024

View reviewed changes

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp Show resolved Hide resolved

clang-format

28b125e

vitalybuka reviewed Nov 1, 2024

View reviewed changes

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp Outdated Show resolved Hide resolved

thurstond added 2 commits November 1, 2024 18:20

Add note about origin calculation

f9f7406

Use push_back instead of append(1,...), per Vitaly's comment

3d96370

vitalybuka reviewed Nov 1, 2024

View reviewed changes

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp Outdated Show resolved Hide resolved

Split up for loop per Vitaly's feedback

cba4ed0

vitalybuka approved these changes Nov 1, 2024

View reviewed changes

"OR(CI, IDX)" instead of checking shadow, per Vitaly's comment

be1ff18

thurstond requested a review from vitalybuka November 1, 2024 20:14

fmayer approved these changes Nov 1, 2024

View reviewed changes

thurstond merged commit e549ec5 into llvm:main Nov 1, 2024
6 of 8 checks passed

thurstond mentioned this pull request Nov 6, 2024

MSAN reports false positives on interleaved storage on ARM AArch64 #72848

Closed

thurstond mentioned this pull request Jan 18, 2025

[msan] Generalize handleIntrinsicByApplyingToShadow by adding bitcasting #123474

Merged

thurstond mentioned this pull request Jan 23, 2025

[msan] Handle horizontal add/subtract intrinsic by applying to shadow #124159

Merged

thurstond mentioned this pull request Feb 4, 2025

[msan] Handle llvm.bitreverse by applying intrinsic to shadow #125606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl/tbx #114490

[msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl/tbx #114490

thurstond commented Oct 31, 2024 •

edited

Loading

llvmbot commented Oct 31, 2024 •

edited

Loading

github-actions bot commented Oct 31, 2024 •

edited

Loading

thurstond commented Nov 1, 2024

[msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl/tbx #114490

[msan] Add handleIntrinsicByApplyingToShadow; support NEON tbl/tbx #114490

Conversation

thurstond commented Oct 31, 2024 • edited Loading

llvmbot commented Oct 31, 2024 • edited Loading

github-actions bot commented Oct 31, 2024 • edited Loading

thurstond commented Nov 1, 2024

thurstond commented Oct 31, 2024 •

edited

Loading

llvmbot commented Oct 31, 2024 •

edited

Loading

github-actions bot commented Oct 31, 2024 •

edited

Loading