Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing a field of a Vector4 causes later codegen to be inefficient if inlined #10045

Closed
tannergooding opened this issue Mar 27, 2018 · 3 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Mar 27, 2018

As a simple example, look at:

static void Main(string[] args)
{
    var value = new Vector4(0, 2, 3, 4);
    var result = Test(value, 1);
    Console.WriteLine(result);
}

public static Vector4 Test(Vector4 left, float right)
{
    left.X += right;
    return left;
}

This is currently producing:

; xmm0 = { 1, 2, 3, 4 }
C4E17929442420       vmovapd  xmmword ptr [rsp+20H], xmm0
488D4C2420           lea      rcx, bword ptr [rsp+20H]
C4E17A1001           vmovss   xmm0, dword ptr [rcx]
C4E17A58056B000000   vaddss   xmm0, dword ptr [reloc @RWD12]
C4E17A1101           vmovss   dword ptr [rcx], xmm0
48B9A06E7D00F87F0000 mov      rcx, 0x7FF8007D6EA0
E8DB1B805F           call     CORINFO_HELP_NEWSFAST
488BC8               mov      rcx, rax
488D4108             lea      rax, bword ptr [rcx+8]
C4E17A10442420       vmovss   xmm0, dword ptr [rsp+20H]
C4E17A1100           vmovss   dword ptr [rax], xmm0
C4E17A10442424       vmovss   xmm0, dword ptr [rsp+24H]
C4E17A114004         vmovss   dword ptr [rax+4], xmm0
C4E17A10442428       vmovss   xmm0, dword ptr [rsp+28H]
C4E17A114008         vmovss   dword ptr [rax+8], xmm0
C4E17A1044242C       vmovss   xmm0, dword ptr [rsp+2CH]
C4E17A11400C         vmovss   dword ptr [rax+12], xmm0
E844FBFFFF           call     System.Console:WriteLine(ref)

Manually inlining this code to be:

static void Main(string[] args)
{
    var value = new Vector4(0, 2, 3, 4);
    value.X += 1;
    Console.WriteLine(value);
}

Produces the much more efficient code:

; xmm0 = { 1, 2, 3, 4 }
C4E17929442420       vmovapd  xmmword ptr [rsp+20H], xmm0
488D4C2420           lea      rcx, bword ptr [rsp+20H]
C4E17A1001           vmovss   xmm0, dword ptr [rcx]
C4E17A580547000000   vaddss   xmm0, dword ptr [reloc @RWD12]
C4E17A1101           vmovss   dword ptr [rcx], xmm0
48B9A06E7C00F87F0000 mov      rcx, 0x7FF8007C6EA0
E8CF1B815F           call     CORINFO_HELP_NEWSFAST
488BC8               mov      rcx, rax
C4E17928442420       vmovapd  xmm0, xmmword ptr [rsp+20H]
C4E179114108         vmovupd  xmmword ptr [rcx+8], xmm0
E862FBFFFF           call     System.Console:WriteLine(ref)
90                   nop

We should recognize when the accessed value is 16-bytes and just do a block-copy (as was done with the manually inlined code).

It may also be beneficial to recognize that [rsp+20H] contains the "last use" of value and that we can just reference the memory directly (without needing to copy).

category:cq
theme:vector-codegen
skill-level:intermediate
cost:small
impact:small

@tannergooding
Copy link
Member Author

FYI. @CarolEidt, @eerhardt

@CarolEidt
Copy link
Contributor

This is related to work item 7 in https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md, and also to #8016, though this is a bit different because it deals with a SIMD type, for which fgMorphCombineSIMDFieldAssignments() is supposed to merge these kinds of copies - though it may behave differently for a copy to an argument.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Oct 30, 2022
@tannergooding
Copy link
Member Author

I fixed this a while back as part of the general refactorings to use HWIntrinsics behind the scenes for all of System.Numerics.

Current codegen is:

       48B9D0067B6FF97F0000 mov      rcx, 0x7FF96F7B06D0
       E8DA38B25F           call     CORINFO_HELP_NEWSFAST
       C5F8100522000000     vmovups  xmm0, xmmword ptr [reloc @RWD00]
       C4E37921052800000000 vinsertps xmm0, xmm0, dword ptr [reloc @RWD16], 0
       C5F8114008           vmovups  xmmword ptr [rax+08H], xmm0
       488BC8               mov      rcx, rax
       FF15DA9D2E00         call     [System.Console:WriteLine(System.Object)]
       90                   nop    

@ghost ghost locked as resolved and limited conversation to collaborators Apr 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

4 participants