-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AtomicExpand: Allow incrementally legalizing atomicrmw #103371
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-aarch64 Author: Matt Arsenault (arsenm) ChangesIf a lowering changed control flow, resume the legalization This will allow incrementally legalizing atomicrmw and cmpxchg. The AArch64 test might be a bugfix. Previously it would lower Full diff: https://github.com/llvm/llvm-project/pull/103371.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index d8f33c42a8a14..002ef4c856341 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -351,17 +351,30 @@ bool AtomicExpandImpl::run(Function &F, const TargetMachine *TM) {
bool MadeChange = false;
- SmallVector<Instruction *, 1> AtomicInsts;
-
- // Changing control-flow while iterating through it is a bad idea, so gather a
- // list of all atomic instructions before we start.
- for (Instruction &I : instructions(F))
- if (I.isAtomic() && !isa<FenceInst>(&I))
- AtomicInsts.push_back(&I);
-
- for (auto *I : AtomicInsts) {
- if (processAtomicInstr(I))
- MadeChange = true;
+ for (Function::iterator BBI = F.begin(), BBE = F.end(); BBI != BBE;) {
+ BasicBlock *BB = &*BBI;
+ ++BBI;
+
+ BasicBlock::iterator Next;
+
+ for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;
+ I = Next) {
+ Instruction &Inst = *I;
+ Next = std::next(I);
+
+ if (processAtomicInstr(&Inst)) {
+ MadeChange = true;
+
+ // Detect control flow change and resume iteration from the original
+ // block to inspect any newly inserted blocks. This allows incremental
+ // legalizaton of atomicrmw and cmpxchg.
+ if (BB != Next->getParent()) {
+ BBI = BB->getIterator();
+ BBE = F.end();
+ break;
+ }
+ }
+ }
}
return MadeChange;
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-fadd-fp-vector.ll b/llvm/test/CodeGen/AArch64/atomicrmw-fadd-fp-vector.ll
index a7539ac3cce80..793b4569f203f 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-fadd-fp-vector.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-fadd-fp-vector.ll
@@ -8,31 +8,35 @@ define <2 x half> @test_atomicrmw_fadd_v2f16_align4(ptr addrspace(1) %ptr, <2 x
; NOLSE-NEXT: fcvtl v1.4s, v0.4h
; NOLSE-NEXT: ldr s0, [x0]
; NOLSE-NEXT: b .LBB0_2
-; NOLSE-NEXT: .LBB0_1: // %atomicrmw.start
+; NOLSE-NEXT: .LBB0_1: // %cmpxchg.nostore
; NOLSE-NEXT: // in Loop: Header=BB0_2 Depth=1
-; NOLSE-NEXT: fmov s0, w10
-; NOLSE-NEXT: cmp w10, w9
-; NOLSE-NEXT: b.eq .LBB0_5
+; NOLSE-NEXT: mov w9, wzr
+; NOLSE-NEXT: clrex
+; NOLSE-NEXT: fmov s0, w8
+; NOLSE-NEXT: cbnz w9, .LBB0_6
; NOLSE-NEXT: .LBB0_2: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB0_3 Depth 2
; NOLSE-NEXT: fcvtl v2.4s, v0.4h
-; NOLSE-NEXT: fmov w9, s0
+; NOLSE-NEXT: fmov w10, s0
; NOLSE-NEXT: fadd v2.4s, v2.4s, v1.4s
; NOLSE-NEXT: fcvtn v2.4h, v2.4s
-; NOLSE-NEXT: fmov w8, s2
-; NOLSE-NEXT: .LBB0_3: // %atomicrmw.start
+; NOLSE-NEXT: fmov w9, s2
+; NOLSE-NEXT: .LBB0_3: // %cmpxchg.start
; NOLSE-NEXT: // Parent Loop BB0_2 Depth=1
; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
-; NOLSE-NEXT: ldaxr w10, [x0]
-; NOLSE-NEXT: cmp w10, w9
+; NOLSE-NEXT: ldaxr w8, [x0]
+; NOLSE-NEXT: cmp w8, w10
; NOLSE-NEXT: b.ne .LBB0_1
-; NOLSE-NEXT: // %bb.4: // %atomicrmw.start
+; NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; NOLSE-NEXT: // in Loop: Header=BB0_3 Depth=2
-; NOLSE-NEXT: stlxr wzr, w8, [x0]
-; NOLSE-NEXT: cbnz wzr, .LBB0_3
-; NOLSE-NEXT: b .LBB0_1
-; NOLSE-NEXT: .LBB0_5: // %atomicrmw.end
+; NOLSE-NEXT: stlxr w11, w9, [x0]
+; NOLSE-NEXT: cbnz w11, .LBB0_3
+; NOLSE-NEXT: // %bb.5: // in Loop: Header=BB0_2 Depth=1
+; NOLSE-NEXT: mov w9, #1 // =0x1
+; NOLSE-NEXT: fmov s0, w8
+; NOLSE-NEXT: cbz w9, .LBB0_2
+; NOLSE-NEXT: .LBB0_6: // %atomicrmw.end
; NOLSE-NEXT: // kill: def $d0 killed $d0 killed $q0
; NOLSE-NEXT: ret
;
@@ -64,29 +68,33 @@ define <2 x float> @test_atomicrmw_fadd_v2f32_align8(ptr addrspace(1) %ptr, <2 x
; NOLSE: // %bb.0:
; NOLSE-NEXT: ldr d1, [x0]
; NOLSE-NEXT: b .LBB1_2
-; NOLSE-NEXT: .LBB1_1: // %atomicrmw.start
+; NOLSE-NEXT: .LBB1_1: // %cmpxchg.nostore
; NOLSE-NEXT: // in Loop: Header=BB1_2 Depth=1
-; NOLSE-NEXT: fmov d1, x10
-; NOLSE-NEXT: cmp x10, x9
-; NOLSE-NEXT: b.eq .LBB1_5
+; NOLSE-NEXT: mov w9, wzr
+; NOLSE-NEXT: clrex
+; NOLSE-NEXT: fmov d1, x8
+; NOLSE-NEXT: cbnz w9, .LBB1_6
; NOLSE-NEXT: .LBB1_2: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB1_3 Depth 2
; NOLSE-NEXT: fadd v2.2s, v1.2s, v0.2s
-; NOLSE-NEXT: fmov x9, d1
-; NOLSE-NEXT: fmov x8, d2
-; NOLSE-NEXT: .LBB1_3: // %atomicrmw.start
+; NOLSE-NEXT: fmov x10, d1
+; NOLSE-NEXT: fmov x9, d2
+; NOLSE-NEXT: .LBB1_3: // %cmpxchg.start
; NOLSE-NEXT: // Parent Loop BB1_2 Depth=1
; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
-; NOLSE-NEXT: ldaxr x10, [x0]
-; NOLSE-NEXT: cmp x10, x9
+; NOLSE-NEXT: ldaxr x8, [x0]
+; NOLSE-NEXT: cmp x8, x10
; NOLSE-NEXT: b.ne .LBB1_1
-; NOLSE-NEXT: // %bb.4: // %atomicrmw.start
+; NOLSE-NEXT: // %bb.4: // %cmpxchg.trystore
; NOLSE-NEXT: // in Loop: Header=BB1_3 Depth=2
-; NOLSE-NEXT: stlxr wzr, x8, [x0]
-; NOLSE-NEXT: cbnz wzr, .LBB1_3
-; NOLSE-NEXT: b .LBB1_1
-; NOLSE-NEXT: .LBB1_5: // %atomicrmw.end
+; NOLSE-NEXT: stlxr w11, x9, [x0]
+; NOLSE-NEXT: cbnz w11, .LBB1_3
+; NOLSE-NEXT: // %bb.5: // in Loop: Header=BB1_2 Depth=1
+; NOLSE-NEXT: mov w9, #1 // =0x1
+; NOLSE-NEXT: fmov d1, x8
+; NOLSE-NEXT: cbz w9, .LBB1_2
+; NOLSE-NEXT: .LBB1_6: // %atomicrmw.end
; NOLSE-NEXT: fmov d0, d1
; NOLSE-NEXT: ret
;
|
The AArch64 backend has a late fallback for cmpxchg because at -O0, expanding ll/sc loops at the IR level doesn't work: the register allocator inserts memory ops in unfortunate places. But when optimization is on, expanding ll/sc loops usually generates slightly better code. It looks like this was somehow using the -O0 codepath when it wasn't supposed to. The generated code with the patch is pretty messy, though; we somehow end up with conditional branches on constants. Maybe that's taildup? If it is, can we optimize the IR to avoid that? |
It specifically requests expanding to cmpxchg for any FP atomic The dubious assume-O0-means-fast-RA case is handled below |
To be clear, I think this patch is doing the right thing. As the code notes, we can't stick an fadd inside an ll/sc loop, so we need to expand in two stages... and doing the second stage early is consistent with the rest of the code. |
If you want the fadd in the ll/sc loop, that's in #103702 |
If a lowering changed control flow, resume the legalization loop at the first newly inserted block. This will allow incrementally legalizing atomicrmw and cmpxchg. The AArch64 test might be a bugfix. Previously it would lower the vector FP case as a cmpxchg loop, but cmpxchgs get lowered but previously weren't. Maybe it shouldn't be reporting cmpxchg for the expand type in the first place though.
c56603f
to
04f6dbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
; SOFTFP-NOLSE-NEXT: b.eq .LBB1_5 | ||
; SOFTFP-NOLSE-NEXT: mov w8, wzr | ||
; SOFTFP-NOLSE-NEXT: clrex | ||
; SOFTFP-NOLSE-NEXT: cbnz w8, .LBB1_6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"mov w8, wzr; cbnz w8, .LBB1_6" should just be an unconditional branch... might be a bit tricky to fix due to pass ordering? Should AtomicExpand or some other IR pass after it do some kind of CFG simplification?
I'm willing to just accept this as-is, and deal with it later if anyone cares, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AArch64 has "-aarch64-enable-atomic-cfg-tidy", which adds an extra simplifyCFG run to clean up after this. Not sure why this isn't universal, or why the expand pass doesn't just directly call simplifyCFG on modified blocks. I vaguely recall @rovka looking into this?
…)" This reverts commit 206b5af.
Next = std::next(I); | ||
|
||
if (processAtomicInstr(&Inst)) { | ||
MadeChange = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up, I reverted the patch again
processAtomicInstr
may erase Next
This seems to break check-clang everywhere, eg http://45.33.8.238/linux/147013/step_6.txt Please take a look and revert for now if it takes a while to fix. |
This should be trivial to fix, the messages are just now in a different order |
Fixed a291fe5 |
Thank you! |
If a lowering changed control flow, resume the legalization
loop at the first newly inserted block.
This will allow incrementally legalizing atomicrmw and cmpxchg.
The AArch64 test might be a bugfix. Previously it would lower
the vector FP case as a cmpxchg loop, but cmpxchgs get lowered
but previously weren't. Maybe it shouldn't be reporting cmpxchg
for the expand type in the first place though.