-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Excessive copies when inlining #4308
Comments
@omariom This is a known issue and on our list of future code-quality improvements. The Guid type is a 16-byte struct. The JIT has limitations today with handling structs in some optimizations so this issue is really the JIT optimizers inability to copy-propagate away structs in general. In this case the inlining happens to make the copies but you can write cases that just involve copying structs to demonstrate the same problem. You can also rewrite the test case with a scalar (e.g. long) and see that all the copies are eliminated. Thanks for reporting. |
@cmckinsey, thanks for the insights! Is it due to this line: src/jit/optimizer.cpp#L2855? |
Neah, that just affects certain loop optimizations. And in general, it's never because of a line like that, it's because of lines that do not exists after such a line 😄. |
I think you can easily beat me to the pull-request @mikedn. Go for it! 😄 |
@jasonwilliams200OK I'd love to but I doubt that I have enough spare time for it. Besides, it's a bit unlikely that it can be done in a single pull-request. Not to mention that the exact approach will probably have to be discussed with the rest of the JIT team. In short, it won't happen overnight. |
Nice to see pull-requests are already being discussed 👍 |
Here's the current codegen: ; Assembly listing for method X:TestValueTypesInInlinedMethods()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 loc0 [V00 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SB] ld-addr-op
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] "OutgoingArgSpace"
;* V02 tmp1 [V02 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SB] "Inlining Arg"
;* V03 tmp2 [V03 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SB] "Inlining Arg"
;* V04 tmp3 [V04 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SB] "Inlining Arg"
;
; Lcl frame size = 40
G_M32745_IG01:
sub rsp, 40
vzeroupper
;; bbWeight=1 PerfScore 1.25
G_M32745_IG02:
mov rcx, 0xD1FFAB1E
mov edx, 1
call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
mov rax, 0xD1FFAB1E
mov rax, gword ptr [rax]
vxorps xmm0, xmm0
vmovdqu xmmword ptr [rax+8], xmm0
;; bbWeight=1 PerfScore 5.08
G_M32745_IG03:
add rsp, 40
ret
;; bbWeight=1 PerfScore 1.25 Methods 1 and 2 still have an extra copy though: ; Assembly listing for method X:Method1(System.Guid)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 6 ) byref -> rcx
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] "OutgoingArgSpace"
; V02 tmp1 [V02,T01] ( 2, 4 ) struct (16) [rsp+0x28] do-not-enreg[SB] "Inlining Arg"
;* V03 tmp2 [V03 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SB] "Inlining Arg"
;
; Lcl frame size = 56
G_M20035_IG01:
sub rsp, 56
vzeroupper
;; bbWeight=1 PerfScore 1.25
G_M20035_IG02:
vmovdqu xmm0, xmmword ptr [rcx]
vmovdqu xmmword ptr [rsp+28H], xmm0
mov rcx, 0xD1FFAB1E
mov edx, 1
call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
mov rax, 0xD1FFAB1E
mov rax, gword ptr [rax]
vmovdqu xmm0, xmmword ptr [rsp+28H]
vmovdqu xmmword ptr [rax+8], xmm0
;; bbWeight=1 PerfScore 8.75
G_M20035_IG03:
add rsp, 56
ret
;; bbWeight=1 PerfScore 1.25
|
This was originally issue 1133 in dotnet/coreclr, and there is a test, JIT\Regression\JitBlue\GitHub_1133, that captures this (to enable improvements or regressions to show up when we run jit diffs). |
… to correctly reflect the code in the issue. Add tests for other issues to enable easy diff-ing
This is a struct copy-prop issue. Assigning it over to @sandreenko |
There are 2 items left to close this issue:
both are doable after the recent changes, just don't fit into 6.0 window. |
The current codegen is much better. G_M1099_IG01:
vzeroupper
;; bbWeight=1 PerfScore 1.00
G_M1099_IG02:
vxorps xmm0, xmm0
vmovdqu xmmword ptr [rcx+8], xmm0
;; bbWeight=1 PerfScore 2.33
G_M1099_IG03:
ret
;; bbWeight=1 PerfScore 1.00 |
Arm64 code: G_M1099_IG01:
stp fp, lr, [sp,#-16]!
mov fp, sp
;; bbWeight=1 PerfScore 1.50
G_M1099_IG02:
stp xzr, xzr, [x0,#8]
;; bbWeight=1 PerfScore 1.00
G_M1099_IG03:
ldp fp, lr, [sp],#16
ret lr
;; bbWeight=1 PerfScore 2.00 |
For the following code:
JIT generates these machine instructions:
Though all the methods are inlined and don't change their params, JIT excessively allocates stack memory for each param and copies their values.
Double zeroing is a minor issue.
category:cq
theme:structs
skill-level:expert
cost:large
impact:large
The text was updated successfully, but these errors were encountered: