JIT: Excessive copies when inlining #4308

omariom · 2015-06-12T13:15:13Z

For the following code:

    static Guid s_dt;

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void TestValueTypesInInlinedMethods()
    {
        var dt = new Guid();

        Method1(dt);
    }

    private static void Method1(Guid dt)
    {
        Method2(dt);
    }

    private static void Method2(Guid dt)
    {
        Method3(dt);
    }

    private static void Method3(Guid dt)
    {
        s_dt = dt;
    }

JIT generates these machine instructions:

 sub         rsp,48h  
 xor         eax,eax  
 mov         qword ptr [rsp+38h],rax  
 mov         qword ptr [rsp+40h],rax  
 xor         eax,eax  

 lea         rdx,[rsp+38h]  
 xorpd       xmm0,xmm0  
 movdqu      xmmword ptr [rdx],xmm0  

 movdqu      xmm0,xmmword ptr [rsp+38h]  
 movdqu      xmmword ptr [rsp+28h],xmm0  
 movdqu      xmm0,xmmword ptr [rsp+28h]  
 movdqu      xmmword ptr [rsp+18h],xmm0  
 movdqu      xmm0,xmmword ptr [rsp+18h]  
 movdqu      xmmword ptr [rsp+8],xmm0  

 mov         rax,405B5B0040h  
 mov         rax,qword ptr [rax]  
 add         rax,8  
 movdqu      xmm0,xmmword ptr [rsp+8]  
 movdqu      xmmword ptr [rax],xmm0  
 add         rsp,48h  
 ret

Though all the methods are inlined and don't change their params, JIT excessively allocates stack memory for each param and copies their values.
Double zeroing is a minor issue.

category:cq
theme:structs
skill-level:expert
cost:large
impact:large

cmckinsey · 2015-06-13T00:02:20Z

@omariom This is a known issue and on our list of future code-quality improvements.

The Guid type is a 16-byte struct. The JIT has limitations today with handling structs in some optimizations so this issue is really the JIT optimizers inability to copy-propagate away structs in general. In this case the inlining happens to make the copies but you can write cases that just involve copying structs to demonstrate the same problem. You can also rewrite the test case with a scalar (e.g. long) and see that all the copies are eliminated.

Thanks for reporting.

ghost · 2015-06-13T07:49:39Z

@cmckinsey, thanks for the insights! Is it due to this line: src/jit/optimizer.cpp#L2855?

mikedn · 2015-06-13T10:06:13Z

Is it due to this line: src/jit/optimizer.cpp#L2855?

Neah, that just affects certain loop optimizations. And in general, it's never because of a line like that, it's because of lines that do not exists after such a line 😄.

ghost · 2015-06-13T10:45:55Z

I think you can easily beat me to the pull-request @mikedn. Go for it! 😄

mikedn · 2015-06-14T07:03:44Z

I think you can easily beat me to the pull-request @mikedn. Go for it!

@jasonwilliams200OK I'd love to but I doubt that I have enough spare time for it. Besides, it's a bit unlikely that it can be done in a single pull-request. Not to mention that the exact approach will probably have to be discussed with the rest of the JIT team. In short, it won't happen overnight.

omariom · 2015-06-14T13:27:40Z

Nice to see pull-requests are already being discussed 👍

AndyAyersMS · 2020-06-04T18:38:29Z

Here's the current codegen:

; Assembly listing for method X:TestValueTypesInInlinedMethods()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SB] ld-addr-op
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SB] "Inlining Arg"
;* V03 tmp2         [V03    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SB] "Inlining Arg"
;* V04 tmp3         [V04    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SB] "Inlining Arg"
;
; Lcl frame size = 40

G_M32745_IG01:
       sub      rsp, 40
       vzeroupper 
						;; bbWeight=1    PerfScore 1.25
G_M32745_IG02:
       mov      rcx, 0xD1FFAB1E
       mov      edx, 1
       call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       mov      rax, 0xD1FFAB1E
       mov      rax, gword ptr [rax]
       vxorps   xmm0, xmm0
       vmovdqu  xmmword ptr [rax+8], xmm0
						;; bbWeight=1    PerfScore 5.08
G_M32745_IG03:
       add      rsp, 40
       ret      
						;; bbWeight=1    PerfScore 1.25

Methods 1 and 2 still have an extra copy though:

; Assembly listing for method X:Method1(System.Guid)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  6   )   byref  ->  rcx        
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V02 tmp1         [V02,T01] (  2,  4   )  struct (16) [rsp+0x28]   do-not-enreg[SB] "Inlining Arg"
;* V03 tmp2         [V03    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SB] "Inlining Arg"
;
; Lcl frame size = 56

G_M20035_IG01:
       sub      rsp, 56
       vzeroupper 
						;; bbWeight=1    PerfScore 1.25
G_M20035_IG02:
       vmovdqu  xmm0, xmmword ptr [rcx]
       vmovdqu  xmmword ptr [rsp+28H], xmm0
       mov      rcx, 0xD1FFAB1E
       mov      edx, 1
       call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       mov      rax, 0xD1FFAB1E
       mov      rax, gword ptr [rax]
       vmovdqu  xmm0, xmmword ptr [rsp+28H]
       vmovdqu  xmmword ptr [rax+8], xmm0
						;; bbWeight=1    PerfScore 8.75
G_M20035_IG03:
       add      rsp, 56
       ret      
						;; bbWeight=1    PerfScore 1.25

CarolEidt · 2020-10-14T20:31:40Z

This was originally issue 1133 in dotnet/coreclr, and there is a test, JIT\Regression\JitBlue\GitHub_1133, that captures this (to enable improvements or regressions to show up when we run jit diffs).
The propagation of the zero is handled in the TestValueTypesInInlinedMethods method via the constant assertion prop that happens during global morph. In Method1 and Method2, the value is not known to be zero, and value numbering is not assigning the same value number to the indirection of the byref incoming arg to the local that it's assigned to. I believe that this is an issue with value numbering of struct types.

… to correctly reflect the code in the issue. Add tests for other issues to enable easy diff-ing

AndyAyersMS · 2021-04-09T02:21:53Z

This is a struct copy-prop issue.

Assigning it over to @sandreenko

sandreenko · 2021-07-09T20:30:18Z

There are 2 items left to close this issue:

support any kind of sources for STORE_LCL_VAR, right now we support only LCL_VAR, CNST_INT, in this case it is IND;
support STORE_BLK with a LCL_VAR src in a register.

both are doable after the recent changes, just don't fit into 6.0 window.

kunalspathak · 2021-09-23T17:35:28Z

The current codegen is much better.
Note: I have removed static and replaced them with instance fields/methods to remove other noise related to static handling.

G_M1099_IG01:
       vzeroupper
                                                ;; bbWeight=1    PerfScore 1.00
G_M1099_IG02:
       vxorps   xmm0, xmm0
       vmovdqu  xmmword ptr [rcx+8], xmm0
                                                ;; bbWeight=1    PerfScore 2.33
G_M1099_IG03:
       ret
                                                ;; bbWeight=1    PerfScore 1.00

kunalspathak · 2021-09-23T17:36:24Z

Arm64 code:

G_M1099_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
                                                ;; bbWeight=1    PerfScore 1.50
G_M1099_IG02:
            stp     xzr, xzr, [x0,#8]
                                                ;; bbWeight=1    PerfScore 1.00
G_M1099_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

omariom changed the title ~~JIT: Excessive copies when inlinning~~ JIT: Excessive copies when inlining Jun 12, 2015

cmckinsey self-assigned this Jun 13, 2015

BruceForstall unassigned cmckinsey Sep 21, 2019

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

CarolEidt modified the milestones: Future, 6.0.0 Oct 14, 2020

CarolEidt mentioned this issue Oct 15, 2020

Extra zeroing with structs and inlining #8186

Closed

CarolEidt added a commit to CarolEidt/runtime that referenced this issue Dec 4, 2020

Fix test for coreclr issues dotnet#1133 (now runtime issue dotnet#4308)…

a020c41

… to correctly reflect the code in the issue. Add tests for other issues to enable easy diff-ing

JulieLeeMSFT assigned AndyAyersMS Mar 23, 2021

JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021

AndyAyersMS removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Apr 9, 2021

AndyAyersMS assigned sandreenko and unassigned AndyAyersMS Apr 9, 2021

sandreenko modified the milestones: 6.0.0, 7.0.0 Jul 9, 2021

kunalspathak closed this as completed Sep 23, 2021

ghost locked as resolved and limited conversation to collaborators Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Excessive copies when inlining #4308

JIT: Excessive copies when inlining #4308

omariom commented Jun 12, 2015 •

edited by CarolEidt

Loading

cmckinsey commented Jun 13, 2015

ghost commented Jun 13, 2015

mikedn commented Jun 13, 2015

ghost commented Jun 13, 2015

mikedn commented Jun 14, 2015

omariom commented Jun 14, 2015

AndyAyersMS commented Jun 4, 2020

CarolEidt commented Oct 14, 2020

AndyAyersMS commented Apr 9, 2021

sandreenko commented Jul 9, 2021

kunalspathak commented Sep 23, 2021

kunalspathak commented Sep 23, 2021

JIT: Excessive copies when inlining #4308

JIT: Excessive copies when inlining #4308

Comments

omariom commented Jun 12, 2015 • edited by CarolEidt Loading

cmckinsey commented Jun 13, 2015

ghost commented Jun 13, 2015

mikedn commented Jun 13, 2015

ghost commented Jun 13, 2015

mikedn commented Jun 14, 2015

omariom commented Jun 14, 2015

AndyAyersMS commented Jun 4, 2020

CarolEidt commented Oct 14, 2020

AndyAyersMS commented Apr 9, 2021

sandreenko commented Jul 9, 2021

kunalspathak commented Sep 23, 2021

kunalspathak commented Sep 23, 2021

omariom commented Jun 12, 2015 •

edited by CarolEidt

Loading