Suboptimal codegen for memset/memcpy unrolling #83277

EgorBo · 2023-03-10T20:13:12Z

When JIT unrolls memset/memcpy it does suboptimal decisions for certain sizes, e.g. to memset 30 bytes:

struct MyStruct
{
    fixed byte a[30];
}

MyStruct Test()
{
    MyStruct s = default; 
    return s;
}

  xor      eax, eax
  vxorps   xmm0, xmm0
  vmovdqu  xmmword ptr [rdx], xmm0
  mov      qword ptr [rdx+10H], rax
  mov      qword ptr [rdx+16H], rax

so to zero 30 bytes it uses GPR twice. It's better to keep using SIMD and overlap with previously zeroed part:

  vxorps   xmm0, xmm0, xmm0
  vmovups  xmmword ptr [rdx+ 14], xmm0
  vmovups  xmmword ptr [rdx], xmm0

Etc for other sizes.

PS: it seems that arm64 is doing the right thing here

ghost · 2023-03-10T20:13:19Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

When JIT unrolls memset/memcpy it does suboptimal decisions for certain sizes, e.g. to memset 30 bytes:

struct MyStruct
{
    fixed byte a[30];
}

MyStruct Test()
{
    MyStruct s = default; 
    return s;
}

  xor      eax, eax
  vxorps   xmm0, xmm0
  vmovdqu  xmmword ptr [rdx], xmm0
  mov      qword ptr [rdx+10H], rax
  mov      qword ptr [rdx+16H], rax

so to zero 30 bytes it uses GPR twice. It's better to keep using SIMD and overlap with previously zeroed part:

  vxorps   xmm0, xmm0, xmm0
  vmovups  xmmword ptr [rdx+ 14], xmm0
  vmovups  xmmword ptr [rdx], xmm0

Etc for other sizes

Author:	EgorBo
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 10, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Mar 10, 2023

EgorBo added this to the 8.0.0 milestone Mar 10, 2023

EgorBo removed the untriaged New issue has not been triaged by the area owner label Mar 10, 2023

EgorBo self-assigned this Mar 10, 2023

This was referenced Mar 11, 2023

Unify unroll limits in a single entry point #83274

Merged

Slightly improve struct zeroing & copying #83488

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 16, 2023

EgorBo closed this as completed in #83488 Mar 17, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 17, 2023

ghost locked as resolved and limited conversation to collaborators Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal codegen for memset/memcpy unrolling #83277

Suboptimal codegen for memset/memcpy unrolling #83277

EgorBo commented Mar 10, 2023 •

edited

Loading

ghost commented Mar 10, 2023

Suboptimal codegen for memset/memcpy unrolling #83277

Suboptimal codegen for memset/memcpy unrolling #83277

Comments

EgorBo commented Mar 10, 2023 • edited Loading

ghost commented Mar 10, 2023

EgorBo commented Mar 10, 2023 •

edited

Loading