Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal codegen for memset/memcpy unrolling #83277

Closed
EgorBo opened this issue Mar 10, 2023 · 1 comment · Fixed by #83488
Closed

Suboptimal codegen for memset/memcpy unrolling #83277

EgorBo opened this issue Mar 10, 2023 · 1 comment · Fixed by #83488
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Mar 10, 2023

When JIT unrolls memset/memcpy it does suboptimal decisions for certain sizes, e.g. to memset 30 bytes:

struct MyStruct
{
    fixed byte a[30];
}

MyStruct Test()
{
    MyStruct s = default; 
    return s;
}
  xor      eax, eax
  vxorps   xmm0, xmm0
  vmovdqu  xmmword ptr [rdx], xmm0
  mov      qword ptr [rdx+10H], rax
  mov      qword ptr [rdx+16H], rax

so to zero 30 bytes it uses GPR twice. It's better to keep using SIMD and overlap with previously zeroed part:

  vxorps   xmm0, xmm0, xmm0
  vmovups  xmmword ptr [rdx+ 14], xmm0
  vmovups  xmmword ptr [rdx], xmm0

Etc for other sizes.

PS: it seems that arm64 is doing the right thing here

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 10, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 10, 2023
@ghost
Copy link

ghost commented Mar 10, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

When JIT unrolls memset/memcpy it does suboptimal decisions for certain sizes, e.g. to memset 30 bytes:

struct MyStruct
{
    fixed byte a[30];
}

MyStruct Test()
{
    MyStruct s = default; 
    return s;
}
  xor      eax, eax
  vxorps   xmm0, xmm0
  vmovdqu  xmmword ptr [rdx], xmm0
  mov      qword ptr [rdx+10H], rax
  mov      qword ptr [rdx+16H], rax

so to zero 30 bytes it uses GPR twice. It's better to keep using SIMD and overlap with previously zeroed part:

  vxorps   xmm0, xmm0, xmm0
  vmovups  xmmword ptr [rdx+ 14], xmm0
  vmovups  xmmword ptr [rdx], xmm0

Etc for other sizes

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo added this to the 8.0.0 milestone Mar 10, 2023
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Mar 10, 2023
@EgorBo EgorBo self-assigned this Mar 10, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 16, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 17, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant