BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 #106494

yonghong-song · 2024-08-29T05:37:20Z

There are two commits in this pull request:

first commit is to revert BPF: Ensure __sync_fetch_and_add() always generate atomic_fetch_add insn #101428
second commit removed atomic_fetch_and_*() to atomic_<op>() conversion if return value is not used, but preserving __sync_fetch_and_add() to locked insn with cpu v1/v2.

yonghong-song · 2024-08-29T05:38:11Z

cc @brycekahle

github-actions · 2024-08-29T05:40:45Z

✅ With the latest revision this PR passed the C/C++ code formatter.

4ast · 2024-08-29T16:33:15Z

llvm/test/CodeGen/BPF/atomics.ll

@@ -1,19 +1,22 @@
-; RUN: llc < %s -march=bpfel -verify-machineinstrs -show-mc-encoding | FileCheck --check-prefixes=CHECK,CHECK-V2 %s
-; RUN: llc < %s -march=bpfel -verify-machineinstrs -show-mc-encoding -mcpu=v3 | FileCheck --check-prefixes=CHECK,CHECK-V3 %s
+; RUN: llc < %s -march=bpfel -verify-machineinstrs -show-mc-encoding | FileCheck %s


should we add -mcpu=v1 here and maybe flip a default to -mcpu=v3 ?
So that by default memory ordering is preserved and users that want to see xadd insn for (void)fetch_and_add
have to explicitly request it with -mcpu=v1.

We can do this. This may cause verification failure or other things from v1 to v3 though. I guess users can either debug it to make it work with v3 or use -mcpu=v1 to restore its old behavior. We can have a separate patch just to flip the default from v1 to v3. I can then post it to bpf mailing list to let people know.

Regarding to selftests, current in llvm/test/CodeGen/BPF directory (not including sub-directory), we only '-mcpu=v1' for llvm-objdump since llvm-objdump has been improved to dump insn based on cpu v4.
Otherwise, -mcpu=v1 is the default and not specified in the test.

[~/work/llvm-project/llvm/test/CodeGen/BPF (main)]$ grep "mcpu=v1" *.ll objdump_cond_op.ll:; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck %s objdump_static_var.ll:; RUN: llc -mtriple=bpfel -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck --check-prefix=CHECK %s objdump_static_var.ll:; RUN: llc -mtriple=bpfeb -filetype=obj -o - %s | llvm-objdump --no-print-imm-hex --mcpu=v1 -d - | FileCheck --check-prefix=CHECK %s [~/work/llvm-project/llvm/test/CodeGen/BPF (main)]$

Later when we want to make cpu v3 as the default, all related tests can be modified at that time.

llvm/lib/Target/BPF/BPFInstrInfo.td

llvm/lib/Target/BPF/BPFMIChecking.cpp

…ch_add insn (llvm#101428)" This reverts commit c566769. See discussion in [1]. Currently, with -mcpu=v1/v2, atomic_fetch_add() insn is generated for (void)__sync_fetch_and_add(...). This breaks backward compatibility since there are codes who runs on old system (< 5.12) which does not support atomci_fetch_add(). Now let revert previous comment ([1]) and add additional logic in the next patch to ensure for (void)__sync_fetch_and_add(...), v1/v2: generate locked add insn >= v3: generate atomic_fetch_add insn [1] llvm#101428

In previous commit, atomic_fetch_and_*() operations are converted to atomic_<op>()'s if there are no return values. This is not what we want, we would like to preserve atomic_fetch_and_*() insn so bpf jit can add proper barrier insns. Preserving atomic_fetch_and_*() are okay for most __sync_fetch_and_*() functions, but not for __sync_fetch_and_add() since __sync_fetch_and_add() has been used to generic locked insns in cpu=v1/v2. So after preserving atomic_fetch_and_*() even if return value is not used, XFADDD in BPFInstrInfo.td is adjusted to emit locked add insn for cpu v1/v2 and to emit atomic_fetch_and_add() for cpu >= v3.

The patch is from Eduard. class XADD and class ATOMIC_NOFETCH are almost the same except the pattern matching string. Removing class XADD and defining two Pat's (pattern -> insn) for 32bit and 64bit XADD/XADDD to use ATOMIC_NOFETACH can simplify code a lot.

llvm-ci · 2024-08-30T22:52:21Z

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vls running on linaro-g3-04 while building llvm at step 13 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/143/builds/1815

Here is the relevant piece of the build log for the reference

Step 13 (test-suite) failure: test (failure)
...
cd /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/SingleSource/UnitTests && /usr/local/bin/cmake -E create_symlink /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/test-suite/SingleSource/UnitTests/matrix-types-spec.reference_output /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/SingleSource/UnitTests/matrix-types-spec.reference_output
gmake[2]: Leaving directory '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build'
[ 56%] Built target matrix-types-spec
gmake[1]: Leaving directory '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build'
gmake: *** [Makefile:136: all] Error 2

2024-08-30 22:49:08 INFO: Testing...
2024-08-30 22:49:08 INFO: Execute: /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/bin/llvm-lit -v -j 32 /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build -o /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/output_igh9cyd.json
-- Testing: 10005 tests, 32 workers --
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1026.test (1 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1026.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_1026' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1027.test (2 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1027.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_1027' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1028.test (3 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_1028.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_1028' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_366.test (4 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_366.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_366' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_367.test (5 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_367.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_367' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_368.test (6 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_368.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_368' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_38.test (7 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_38.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_38' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_39.test (8 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_39.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_39' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_40.test (9 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_40.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_40' is missing
********************
NOEXE: test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_697.test (10 of 10005)
******************** TEST 'test-suite :: Bitcode/simd_ops/simd_ops_test_op_abs_697.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Bitcode/simd_ops/simd_ops_test_op_abs_697' is missing
********************

Before llvm20, (void)__sync_fetch_and_add(...) always generates locked xadd insns. In linux kernel upstream discussion [1], it is found that for arm64 architecture, the original semantics of (void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is preferred in order for jit to emit proper native barrier insns. In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will generate the following insns: - for cpu v1/v2: locked xadd insns to keep backward compatibility - for cpu v3/v4: __atomic_fetch_add() insns To ensure proper barrier semantics for (void)__sync_fetch_and_add(...), cpu v3/v4 is recommended. This patch enables cpu=v3 as the default cpu version. For users wanting to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc command line. [1] https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm#101428 [3] llvm#106494

Before llvm20, (void)__sync_fetch_and_add(...) always generates locked xadd insns. In linux kernel upstream discussion [1], it is found that for arm64 architecture, the original semantics of (void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is preferred in order for jit to emit proper native barrier insns. In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will generate the following insns: - for cpu v1/v2: locked xadd insns to keep backward compatibility - for cpu v3/v4: __atomic_fetch_add() insns To ensure proper barrier semantics for (void)__sync_fetch_and_add(...), cpu v3/v4 is recommended. This patch enables cpu=v3 as the default cpu version. For users wanting to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc command line. [1] https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] #101428 [3] #106494

llvm change [1] made a change such that __sync_fetch_and_{and,or,xor}() will generate atomic_fetch_*() insns even if the return value is not used. This is a deliberate choice to make sure barrier semantics are preserved from source code to asm insn. But the change in [1] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,v ar_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [2] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Version 2 of the patch fixed the issue by using inline asm ([3]). But further discussion suggested to find a way from source to generate locked insn which is more user friendly. So in not-merged llvm patch ([4]), if relax memory ordering is used and the return value is not used, locked insn could be generated. So with llvm patch [4] to compile the bpf selftest, the following code __c11_atomic_fetch_and(&and64_value, 0x011ull << 32, memory_order_relaxed); is able to generate locked insn, hence fixing the selftest failure. [1] llvm/llvm-project#106494 [2] d503a04 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") [3] https://lore.kernel.org/bpf/20240803025928.4184433-1-yonghong.song@linux.dev/ [4] llvm/llvm-project#107343 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>

llvm change [1] made a change such that __sync_fetch_and_{and,or,xor}() will generate atomic_fetch_*() insns even if the return value is not used. This is a deliberate choice to make sure barrier semantics are preserved from source code to asm insn. But the change in [1] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,v ar_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [2] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Version 2 of the patch fixed the issue by using inline asm ([3]). But further discussion suggested to find a way from source to generate locked insn which is more user friendly. So in not-merged llvm patch ([4]), if relax memory ordering is used and the return value is not used, locked insn could be generated. So with llvm patch [4] to compile the bpf selftest, the following code __c11_atomic_fetch_and(&and64_value, 0x011ull << 32, memory_order_relaxed); is able to generate locked insn, hence fixing the selftest failure. [1] llvm/llvm-project#106494 [2] d503a04 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") [3] https://lore.kernel.org/bpf/20240803025928.4184433-1-yonghong.song@linux.dev/ [4] llvm/llvm-project#107343 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240909223431.1666305-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>

th0rex · 2024-09-12T13:15:43Z

Hi, this commit broke some of our code, I am not sure whether this is intentional or not. It seems to me that anything using the return value of e.g. __sync_fetch_and_add doesn't work anymore with -mcpu=v2 (nor with -mcpu=v1).

To reproduce:

On the parent of this commit (57fe53c) the following test is working:

~/llvm-project ((57fe53ca))> git status
HEAD detached at 57fe53cae403
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   llvm/test/CodeGen/BPF/atomics.ll

no changes added to commit (use "git add" and/or "git commit -a")

~/llvm-project ((57fe53ca))> git diff
diff --git a/llvm/test/CodeGen/BPF/atomics.ll b/llvm/test/CodeGen/BPF/atomics.ll
index 0c16c49f2a87..5e71dcef85a2 100644
--- a/llvm/test/CodeGen/BPF/atomics.ll
+++ b/llvm/test/CodeGen/BPF/atomics.ll
@@ -19,3 +19,12 @@ entry:
   atomicrmw add ptr %p, i64 %v seq_cst
   ret void
 }
+
+; CHECK-LABEL: test_load_add_ret_64
+; CHECK: r2 = atomic_fetch_add((u64 *)(r1 + 0), r2)
+; CHECK: encoding: [0xdb,0x21,0x00,0x00,0x01,0x00,0x00,0x00]
+define i64 @test_load_add_ret_64(ptr %p, i64 zeroext %v) {
+entry:
+  %0 = atomicrmw add ptr %p, i64 %v seq_cst
+  ret i64 %0
+}

~/llvm-project ((57fe53ca))> ./build/bin/llc -march=bpfel -mcpu=v2 llvm/test/CodeGen/BPF/atomics.ll
~/llvm-project ((57fe53ca))> # runs successfully

With this commit (06c531e), the same test fails:

~/llvm-project ((06c531e8))> git status
HEAD detached at 06c531e808ce
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   llvm/test/CodeGen/BPF/atomics.ll

no changes added to commit (use "git add" and/or "git commit -a")

~/llvm-project ((06c531e8))> git diff
diff --git a/llvm/test/CodeGen/BPF/atomics.ll b/llvm/test/CodeGen/BPF/atomics.ll
index c17b94af5f7b..fbccfe54135e 100644
--- a/llvm/test/CodeGen/BPF/atomics.ll
+++ b/llvm/test/CodeGen/BPF/atomics.ll
@@ -22,3 +22,12 @@ entry:
   atomicrmw add ptr %p, i64 %v seq_cst
   ret void
 }
+
+; CHECK-LABEL: test_load_add_ret_64
+; CHECK: r2 = atomic_fetch_add((u64 *)(r1 + 0), r2)
+; CHECK: encoding: [0xdb,0x21,0x00,0x00,0x01,0x00,0x00,0x00]
+define i64 @test_load_add_ret_64(ptr %p, i64 zeroext %v) {
+entry:
+  %0 = atomicrmw add ptr %p, i64 %v seq_cst
+  ret i64 %0
+}

~/llvm-project ((06c531e8))> ./build/bin/llc -march=bpfel -mcpu=v2 llvm/test/CodeGen/BPF/atomics.ll
error: <unknown>:0:0: in function test_load_add_ret_64 i64 (ptr, i64): Invalid usage of the XADD return value

Like I said, I'm not sure if this is intentional or not - we just switched to inline assembly and call atomic_fetch_add directly rather than relying on __sync_fetch_and_add which still works as expected, but I wanted to bring this to your attention in case this is not intentional.

yonghong-song · 2024-09-12T18:29:19Z

@th0rex thanks for the reporting. The newer behavior in llvm20 is correct. The old one has some issues and some instructions supposed not support in v1/v2 but somehow we did it and their usage is not intentional. But unfortunately there is no way to do backport. See below for some details.

$ cat t.c                                                                                                                         
int f1(int *i) {                                                                                                                                             
   return __sync_fetch_and_add(i, 10);                                                                                                                       
}
$ ~/work/llvm-project/llvm/build.15/install/bin/clang --version
clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/yhs/work/llvm-project/llvm/build.15/install/bin
$ ~/work/llvm-project/llvm/build.15/install/bin/clang --target=bpf -O2 -mcpu=v1 -c t.c                                 
fatal error: error in backend: Invalid usage of the XADD return value

You can see a compiler error will be emitted for cpu v1 if the return value is used. The same will be for v2. For v3, it will be okay since v3 enables 32-bit subregister and we allows new insn atomic_fetch_add. for cpu v1/v2, the atomic_fetch_add is not available.

But for 64bit __sync_fetch_and_add(), it seems going through.

$ cat t1.c
long f1(long *i) {
   return __sync_fetch_and_add(i, 10);
}
$ ~/work/llvm-project/llvm/build.15/install/bin/clang --target=bpf -O2 -mcpu=v1 -c t1.c
$ ~/work/llvm-project/llvm/build.15/install/bin/llvm-objdump -d t1.o

t1.o:   file format elf64-bpf

Disassembly of section .text:

0000000000000000 <f1>:
       0:       b7 00 00 00 0a 00 00 00 r0 = 10
       1:       db 01 00 00 01 00 00 00 r0 = atomic_fetch_add((u64 *)(r1 + 0), r0)
       2:       95 00 00 00 00 00 00 00 exit

So there is a discrepancy between 32bit and 64bit. But intention is for <= llvm19, for v1/v2, no return value should be used (using locked insn) and compiler should issue an error when it does use return value. For v3, it is okay to use return value (using atomic_fetch_add insn).

llvm20 fixed this issue.

$ cat t.c
int f1(int *i) {
   return __sync_fetch_and_add(i, 10);
}
[yhs@devbig309.ftw3 ~/tmp4]$ cat t1.c
long f1(long *i) {
  return __sync_fetch_and_add(i, 10);
}
$ ~/work/llvm-project/llvm/build.20/install/bin/clang --target=bpf -O2 -mcpu=v1 -c t.c
t.c:1:5: error: Invalid usage of the XADD return value
    1 | int f1(int *i) {
      |     ^
1 error generated.
$ ~/work/llvm-project/llvm/build.20/install/bin/clang --target=bpf -O2 -mcpu=v1 -c t1.c
t1.c:1:6: error: Invalid usage of the XADD return value
    1 | long f1(long *i) {
      |      ^
1 error generated.
$ ~/work/llvm-project/llvm/build.20/install/bin/clang --target=bpf -O2 -mcpu=v3 -c t.c
$ ~/work/llvm-project/llvm/build.20/install/bin/clang --target=bpf -O2 -mcpu=v3 -c t1.c
$

So for 64bit insn, using inline asm with atomc_fetch_add() is the right choice if you intends to use cpu v1/v2.

th0rex · 2024-09-13T07:56:13Z

Thanks for the detailed explanation! This makes sense, we'll keep using inline assembly with atomic_fetch_add then.

yonghong-song requested review from eddyz87 and 4ast August 29, 2024 05:37

yonghong-song force-pushed the fix-xadd-2 branch 2 times, most recently from 1aa05fd to 7e0c7e1 Compare August 29, 2024 06:16

4ast reviewed Aug 29, 2024

View reviewed changes

eddyz87 reviewed Aug 29, 2024

View reviewed changes

llvm/lib/Target/BPF/BPFInstrInfo.td Outdated Show resolved Hide resolved

llvm/lib/Target/BPF/BPFMIChecking.cpp Outdated Show resolved Hide resolved

Yonghong Song added 3 commits August 30, 2024 10:24

yonghong-song force-pushed the fix-xadd-2 branch from 7e0c7e1 to 1d37bc7 Compare August 30, 2024 18:17

eddyz87 approved these changes Aug 30, 2024

View reviewed changes

yonghong-song merged commit 06c531e into llvm:main Aug 30, 2024
8 checks passed

yonghong-song mentioned this pull request Sep 2, 2024

[BPF] Make -mcpu=v3 as the default #107008

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 #106494

BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 #106494

yonghong-song commented Aug 29, 2024 •

edited

Loading

yonghong-song commented Aug 29, 2024

github-actions bot commented Aug 29, 2024 •

edited

Loading

4ast Aug 29, 2024

yonghong-song Aug 29, 2024 •

edited

Loading

llvm-ci commented Aug 30, 2024

th0rex commented Sep 12, 2024

yonghong-song commented Sep 12, 2024

th0rex commented Sep 13, 2024

BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 #106494

BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 #106494

Conversation

yonghong-song commented Aug 29, 2024 • edited Loading

yonghong-song commented Aug 29, 2024

github-actions bot commented Aug 29, 2024 • edited Loading

4ast Aug 29, 2024

Choose a reason for hiding this comment

yonghong-song Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

llvm-ci commented Aug 30, 2024

th0rex commented Sep 12, 2024

yonghong-song commented Sep 12, 2024

th0rex commented Sep 13, 2024

yonghong-song commented Aug 29, 2024 •

edited

Loading

github-actions bot commented Aug 29, 2024 •

edited

Loading

yonghong-song Aug 29, 2024 •

edited

Loading