-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import cgt.un(op, 0)
as NE(op, 0)
#54539
Import cgt.un(op, 0)
as NE(op, 0)
#54539
Conversation
23eba6f
to
10057cb
Compare
Use "genActualType" directly as it is now a templated method. Don't create casts to TYP_ULONG - they are identical to casts to TYP_LONG. TYP_ULONG is only relevant for checked casts. Add a TODO on addressing the duplicated logic that upcasts to native int from int32 on 64 bit. Use modern comments. Zero diffs.
Normalizing this idiom helps downstream optimizations.
In morph, when narrowing the AND operand, only insert casts if necessary - prefer to use "optNarrowTree". Otherwise we end up with redundant register shuffling.
10057cb
to
3b61c12
Compare
Case 1: The whole method consists of repeated sequences: ------------ BB13 [06A..073) -> BB15 (cond), preds={BB11,BB12} succs={BB14,BB15}
***** BB13
STMT00062 (IL ???... ???)
N007 ( 9, 8) [000207] ---XG------- * JTRUE void
N006 ( 7, 6) [000377] N--XG--N---- \--* EQ int <l:$416, c:$415>
N004 ( 5, 4) [000374] ---XG------- +--* AND long <l:$469, c:$468>
N002 ( 3, 2) CSE #06 (def)[000367] *--XG------- | +--* IND long <l:$465, c:$464>
N001 ( 1, 1) [000201] ------------ | | \--* LCL_VAR byref V00 this u:1 Zero Fseq[hackishFieldName] $80
N003 ( 1, 1) [000373] ------------ | \--* CNS_INT long 64 $2cb
N005 ( 1, 1) [000376] ------------ \--* CNS_INT long 0 $2c1
------------ BB14 [073..07B), preds={BB13} succs={BB15}
***** BB14
STMT00081 (IL 0x073... ???)
N007 ( 9, 10) [000389] -A-XG---R--- * ASG long $VN.Void
N006 ( 3, 2) [000388] *--XG--N---- +--* IND long <l:$46c, c:$46b>
N005 ( 1, 1) [000229] ------------ | \--* LCL_VAR byref V00 this u:1 Zero Fseq[hackishFieldName] $80
N004 ( 5, 7) [000387] ---XG------- \--* OR long <l:$46e, c:$46d>
N002 ( 3, 2) CSE #06 (use)[000380] *--XG------- +--* IND long <l:$465, c:$46a>
N001 ( 1, 1) [000379] ------------ | \--* LCL_VAR byref V00 this u:1 Zero Fseq[hackishFieldName] $80
N003 ( 1, 4) [000386] ------------ \--* CNS_INT long 0x40000 $2e2 As one can see, there is a CSE opportunity here - and it is taken. Eventually this leads to the following codegen: G_M53887_IG08: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
mov rax, qword ptr [rcx]
mov edx, eax
test dl, 32
je SHORT G_M53887_IG09
or rax, 0x20000
mov qword ptr [rcx], rax
;; bbWeight=0.50 PerfScore 2.38
G_M53887_IG09: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
mov rax, qword ptr [rcx]
mov edx, eax
test dl, 64
je SHORT G_M53887_IG10
or rax, 0xD1FFAB1E
mov qword ptr [rcx], rax
;; bbWeight=0.50 PerfScore 2.38
G_M53887_IG10: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
mov rax, qword ptr [rcx]
mov edx, eax
test dl, 128
je SHORT G_M53887_IG11
or rax, 0xD1FFAB1E
mov qword ptr [rcx], rax
;; bbWeight=0.50 PerfScore 2.38
G_M53887_IG11: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref After this change, there are no CSEs because the ***** BB13
STMT00062 (IL ???... ???)
N007 ( 9, 8) [000207] ---XG------- * JTRUE void
N006 ( 7, 6) [000377] J--XG--N---- \--* EQ int <l:$4d7, c:$4d6>
N004 ( 5, 4) [000374] ---XG------- +--* AND int <l:$4d3, c:$4d2>
N002 ( 3, 2) [000367] *--XG------- | +--* IND int <l:$4cf, c:$4ce>
N001 ( 1, 1) [000201] ------------ | | \--* LCL_VAR byref V00 this u:1 Zero Fseq[hackishFieldName] $80
N003 ( 1, 1) [000373] ------------ | \--* CNS_INT int 64 $51
N005 ( 1, 1) [000376] ------------ \--* CNS_INT int 0 $40
------------ BB14 [073..07B), preds={BB13} succs={BB15}
***** BB14
STMT00081 (IL 0x073... ???)
N007 ( 9, 10) [000389] -A-XG---R--- * ASG long $VN.Void
N006 ( 3, 2) [000388] *--XG--N---- +--* IND long <l:$500, c:$47f>
N005 ( 1, 1) [000229] ------------ | \--* LCL_VAR byref V00 this u:1 Zero Fseq[hackishFieldName] $80
N004 ( 5, 7) [000387] ---XG------- \--* OR long <l:$502, c:$501>
N002 ( 3, 2) [000380] *--XG------- +--* IND long <l:$47e, c:$47d>
N001 ( 1, 1) [000379] ------------ | \--* LCL_VAR byref V00 this u:1 Zero Fseq[hackishFieldName] $80
N003 ( 1, 4) [000386] ------------ \--* CNS_INT long 0x40000 $356 And the codegen is more compact: G_M53887_IG08: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
test byte ptr [rcx], 32
je SHORT G_M53887_IG09
or qword ptr [rcx], 0x20000
;; bbWeight=0.50 PerfScore 4.00
G_M53887_IG09: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
test byte ptr [rcx], 64
je SHORT G_M53887_IG10
or qword ptr [rcx], 0xD1FFAB1E
;; bbWeight=0.50 PerfScore 4.00
G_M53887_IG10: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
test byte ptr [rcx], 128
je SHORT G_M53887_IG11
or qword ptr [rcx], 0xD1FFAB1E
;; bbWeight=0.50 PerfScore 4.00
G_M53887_IG11: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref |
Case 2: In short: GenTreeNode creates assertion:
[000113] ---X-------- * NULLCHECK byte
In BB10 New Local Constant Assertion: V03 != null index=#01, mask=0000000000000001
@@ -4594,19 +4594,20 @@ fgMorphTree BB10, STMT00032 (before)
[000134] -A---------- * ASG bool
[000133] D------N---- +--* LCL_VAR bool V12 tmp6
[000126] ------------ \--* CAST int <- bool <- int
- [000116] N--------U-- \--* GT int
+ [000116] ------------ \--* NE int
[000060] ------------ +--* LCL_VAR ref V03 loc1
[000115] ------------ \--* CNS_INT ref null
+
+Assertion prop for index #01 in BB10:
+ [000116] ------------ * NE int
GenTreeNode creates assertion:
[000134] -A---------- * ASG bool
-In BB10 New Local Subrange Assertion: V12 in [0..1] index=#02, mask=0000000000000002
+In BB10 New Local Constant Assertion: V12 == 1 index=#02, mask=0000000000000002
fgMorphTree BB10, STMT00032 (after)
[000134] -A---+------ * ASG bool
[000133] D----+-N---- +--* LCL_VAR int V12 tmp6
- [000116] N----+------ \--* NE int
- [000060] -----+------ +--* LCL_VAR ref V03 loc1
- [000115] -----+------ \--* CNS_INT ref null
+ [000115] -----+------ \--* CNS_INT int 1
fgMorphTree BB10, STMT00035 (before)
[000144] -A--G------- * ASG ref
@@ -4620,6 +4621,21 @@ fgMorphTree BB10, STMT00033 (before)
[000136] ------------ +--* LCL_VAR int V12 tmp6
[000137] ------------ \--* CNS_INT int 0
+Assertion prop for index #02 in BB10:
+ [000138] J------N---- * NE int
+
+fgMorphTree BB10, STMT00033 (after)
+ [000139] -----+------ * JTRUE void
+ [000137] -----+------ \--* CNS_INT int 1
+
+removing useless STMT00033 (IL 0x032... ???)
+ [000139] -----+------ * JTRUE void
+ [000137] -----+------ \--* CNS_INT int 1
+ from BB10
+
+Conditional folded at BB10
+BB10 becomes a BBJ_ALWAYS to BB13 I suppose this is the intended effect of this PR, though we could of course just teach assertion prop to recognize this particular form instead of the generic "relop implies Similar pattern is eliminated in |
Case 3: In this method, the new code manages to narrow the tree to @@ -834,7 +834,7 @@ Morphing BB04 of 'System.Text.Json.BitStack:Pop():bool:this'
fgMorphTree BB04, STMT00006 (before)
[000035] -A-XG------- * ASG int
[000034] D------N---- +--* LCL_VAR int V01 loc0
- [000033] N--XG----U-- \--* GT int
+ [000033] ---XG------- \--* NE int
[000030] ---XG------- +--* AND long
[000027] ---XG------- | +--* FIELD long hackishFieldName
[000026] ------------ | | \--* LCL_VAR byref V00 this
@@ -853,14 +853,14 @@ In BB04 New Local Subrange Assertion: V01 in [0..1] index=#01, mask=000000000000
fgMorphTree BB04, STMT00006 (after)
[000035] -A-XG+------ * ASG int
[000034] D----+-N---- +--* LCL_VAR int V01 loc0
- [000033] N--XG+------ \--* NE int
- [000030] ---XG+------ +--* AND long
- [000027] *--XG+------ | +--* IND long
+ [000033] ---XG+------ \--* NE int
+ [000030] ---XG+------ +--* AND int
+ [000027] *--XG+------ | +--* IND int
[000071] -----+------ | | \--* ADD byref
[000026] -----+------ | | +--* LCL_VAR byref V00 this
[000070] -----+------ | | \--* CNS_INT long 8 field offset Fseq[hackishFieldName]
- [000029] -----+------ | \--* CNS_INT long 1
- [000032] -----+------ \--* CNS_INT long 0
+ [000029] -----+------ | \--* CNS_INT int 1
+ [000032] -----+------ \--* CNS_INT int 0 This eventually lets lowering narrow things even further: N001 ( 1, 1) [000043] ------------ * LCL_VAR byref V00 this u:1 (last use) $80
@@ -4068,30 +4041,26 @@ N001 ( 1, 1) [000043] ------------ t43 = LCL_VAR byref V00 this
/--* t43 byref
N003 ( 2, 2) [000067] -c---------- t67 = * LEA(b+8) byref
/--* t67 byref
-N004 ( 4, 4) [000044] *---GO------ t44 = * IND long <l:$38c, c:$38b>
- /--* t44 long
-N005 ( 5, 6) [000085] ----GO------ t85 = * CAST int <- long
-N006 ( 1, 1) [000046] -c---------- t46 = CNS_INT int 1
- /--* t85 int
- +--* t46 int
-N009 ( 12, 10) [000050] N---GO------ t50 = * TEST_NE int <l:$258, c:$257>
+N004 ( 4, 4) [000044] *c--GO------ t44 = * IND ubyte <l:$25c, c:$25b>
+N005 ( 1, 1) [000046] -c---------- t46 = CNS_INT ubyte 1 $44
+ /--* t44 ubyte
+ +--* t46 ubyte
+N008 ( 11, 8) [000050] ----GO---U-- t50 = * TEST_NE int <l:$264, c:$263>
/--* t50 int
-N011 ( 12, 10) [000052] DA--GO------ * STORE_LCL_VAR int V01 loc0 d:4
+N010 ( 11, 8) [000052] DA--GO------ * STORE_LCL_VAR int V01 loc0 d:4 Similar improvements in |
Hopefully this can resolve #47920 |
I checked, it will not. At the point when the optimizer will reason about the branches, all of the |
@@ -13064,27 +13064,36 @@ void Compiler::impImportBlockCode(BasicBlock* block) | |||
op2 = impPopStack().val; | |||
op1 = impPopStack().val; | |||
|
|||
// Recognize the IL idiom of CGT_UN(op1, 0) and normalize | |||
// it so that downstream optimizations don't have to. | |||
if ((opcode == CEE_CGT_UN) && op2->IsIntegralConst(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please check the case where op1 is zero (reversed) here - does it produce any diffs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's necessary: cgt.un(0, op)
is always false
, and op1 = gtFoldExpr(op1)
below takes care of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SingleAccretion no, I meant clt.un(0, op)
the same pattern but reversed. Also, cle.un(op, 0)
-- the same transformation in morph does that https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/morph.cpp#L13917-L13918
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for the clarification.
So I did add the clt.un(0, op)
handling here, and the diffs were in about 10 methods across all collections, mostly positive as one would expect (amounting to ~150 bytes), but there was was one regression that got me thinking a bit. The regression is because GT_UN(x, 0)
(reversed from LT(0, x)
) allows assertion prop to eliminate a range check against zero, while NE(x, 0)
doesn't have the same effect. Perhaps we should not be normalizing this idiom after all...
I will say that if it is ok, I would prefer to leave the case here as is. The long-term plan (well, plan for the PR after the next one) is to just fix the ordering problem in morph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually tried exactly the same opt (to recognize GT_NE early in the importer) a long time ago but there were lots of range-checks regressions and I gave up so I guess those were fixed since then (by you? 🙂)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By @nathan-moore's #40180 would be my guess :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that should handle ==,!= 0 in range check. It's wierd that you saw regressions though since it got changed to != in morph before anything that should remove bound checks ran. ¯\_(ツ)_/¯
CI failures look like #54589, so switching as ready-for-review. |
@dotnet/jit-contrib , @EgorBo - let us try to review and merge today before we snap the release. |
Ah, I forgot to "Apporve", LGTM, Thanks! |
This is a pretty common IL idiom (e. g. for comparing against
null
). Today, morph transforms it into the equivalent "not equal" form, but doing it earlier in the importer, apparently, has CQ benefits.In addition to the optimization this PR:
1) Cleans up some importation code.
2) And solves most of the regressions by being a bit smarter in morph.
There is also a
TODO
left in importer, for me, not to forget to unify the handling of upcasts - the way it is done currently has issues (like the fact redundant IR is created for constants).This change will help avoid regressions for my next changes in morph, for post-order
NE/EQ
processing - there is a lot of questionable code there and correctness issues, so I want the refactoring and fixes to be zero-diff.Diffs:
Windows x64
Windows x86
Linux ARM64
Linux ARM
Almost all regressions are caused by now missing CSEs as morph changes more trees. The small tests regressions are because of slightly different register allocation. I think, given the overall positive impact, the regressions are acceptable.
I still need to drill down into what's causing such CQ improvements.Done, see below.