JIT: Fix LowerHWIntrinsicGetElement for xarch #104987

AndyAyersMS · 2024-07-16T17:47:25Z

Ensure we don't reorder evaluation and change exceptional behavior.

Closes #89565.

Ensure we don't reorder evaluation and change exceptional behavior. Closes dotnet#89565.

AndyAyersMS · 2024-07-16T17:47:39Z

@jakobbotsch PTAL
cc @dotnet/jit-contrib

dotnet-policy-service · 2024-07-16T17:47:52Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/lowerxarch.cpp

jakobbotsch

LGTM otherwise. Were the diffs bad if we just skipped the optimization when we cannot move the indir?

src/tests/JIT/Regression/JitBlue/Runtime_89565/Runtime_89595.cs

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

AndyAyersMS · 2024-07-16T18:39:54Z

LGTM otherwise. Were the diffs bad if we just skipped the optimization when we cannot move the indir?

I don't recall -- let me look.

AndyAyersMS · 2024-07-16T18:50:15Z

From SPMI the diffs are somewhat bigger, eg in ASP.NET:

With Fix

[08:46:54] Top method regressions (bytes):
[08:46:54]            6 (13.33% of base) : 66717.dasm - System.Numerics.Vector`1[int]:get_Item(int):int:this (Instrumented Tier0)
[08:46:54]            6 (13.33% of base) : 10948.dasm - System.Numerics.Vector`1[int]:get_Item(int):int:this (Tier0)
[08:46:54]            6 (13.33% of base) : 20741.dasm - System.Numerics.Vector`1[int]:get_Item(int):int:this (Tier0)
[08:46:54]            6 (13.33% of base) : 64929.dasm - System.Runtime.Intrinsics.Vector128`1[int]:get_Item(int):int:this (Tier0)
[08:46:54]            6 (13.04% of base) : 58073.dasm - System.Runtime.Intrinsics.Vector128`1[long]:get_Item(int):long:this (Tier0)
[08:46:54]            6 (13.04% of base) : 36918.dasm - System.Runtime.Intrinsics.Vector128`1[long]:get_Item(int):long:this (Tier0)
[08:46:54]            6 (13.04% of base) : 64402.dasm - System.Runtime.Intrinsics.Vector256`1[long]:get_Item(int):long:this (Tier0)
[08:46:54]            2 ( 8.70% of base) : 152279.dasm - System.Numerics.Vector`1[int]:get_Item(int):int:this (Tier1)

With Avoid

[11:41:47] Top method regressions (bytes):
[11:41:47]           14 (60.87% of base) : 152279.dasm - System.Numerics.Vector`1[int]:get_Item(int):int:this (Tier1)
[11:41:47]           13 (28.89% of base) : 66717.dasm - System.Numerics.Vector`1[int]:get_Item(int):int:this (Instrumented Tier0)
[11:41:47]           13 (28.89% of base) : 20741.dasm - System.Numerics.Vector`1[int]:get_Item(int):int:this (Tier0)
[11:41:47]           13 (28.89% of base) : 10948.dasm - System.Numerics.Vector`1[int]:get_Item(int):int:this (Tier0)
[11:41:47]           13 (28.26% of base) : 64402.dasm - System.Runtime.Intrinsics.Vector256`1[long]:get_Item(int):long:this (Tier0)
[11:41:47]           10 (22.22% of base) : 64929.dasm - System.Runtime.Intrinsics.Vector128`1[int]:get_Item(int):int:this (Tier0)
[11:41:47]           10 (21.74% of base) : 36918.dasm - System.Runtime.Intrinsics.Vector128`1[long]:get_Item(int):long:this (Tier0)
[11:41:47]           10 (21.74% of base) : 58073.dasm - System.Runtime.Intrinsics.Vector128`1[long]:get_Item(int):long:this (Tier0)

There are only a few diffs in non get_Item methods.

AndyAyersMS · 2024-07-17T19:56:06Z

NAOT failure looks to be related:

  ILC: D:\a\_work\1\s\src\coreclr\jit\lsrabuild.cpp:2601
  ILC: Assertion failed '!"Expected empty defList at end of block"' in 'System.Numerics.Matrix4x4:get_Item(int,int):float:this' during 'Linear scan register alloc' (IL size 14; hash 0x67a5e306; FullOpts)

AndyAyersMS · 2024-07-17T22:33:33Z

Ah, other failures too.

tannergooding · 2024-07-18T19:04:09Z

@AndyAyersMS, this seems to have inserted quite a few null checks for the implicit this parameter on structs:

        ; byrRegs +[rcx]
+       cmp      byte  ptr [rcx], cl
        cmp      edx, 4
        jae      SHORT G_M63163_IG04
        vmovsd   xmm0, qword ptr [rcx+8*rdx]
-						;; size=10 bbWeight=1 PerfScore 5.25
+						;; size=12 bbWeight=1 PerfScore 8.25

Is that expected? We normally don't see such null checks (see for example: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACXDKAVzAwYGUGBvGhoIYATCK2AAbGAxgAGANwCho8VJnlF1IQ3oixk6QHEYGAJoAKAJQMAvAD51mgL5A==)

AndyAyersMS · 2024-07-18T20:25:06Z

That's kind of the point. It adds a handful of nullchecks in places where the JIT doesn't or can't prove the vector address won't be null, since the dereference happens before the element access.

tannergooding · 2024-07-18T20:52:51Z

It looks to be generally inconsistent with how we handle it for spans and other cases: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAMIR8AB14AbGFADKMgG68wMXAG4axAMwNcGKBzAYGshgG8aDKwwAmEDsGkMYdDdWu37jmM/JuPltYA2gCyMBgAFhA2AJLikgAUYZHRcWKSAPJifBBcuCwAchAxXJK8XOUA5gCUALqBVtqeDk4A4uEAGgnVDAC8AHwMXDAA7iZi2FwAPHYtMP0JsABmznTVQXS1/tYNTDqz3gztGACa3X2Dw2OyE9MH0gvLvuubbgC+NJo6ZAxCNBbuaxNbi4bBLHz3HxdWQAKgYYh6A3hcH6xy61W2Vl2wLyYIhXicZ1h8MRgzEKOOZwxNDeQA=

I understand the intent of doing null checks here and think that in a case like p->GetElement(0) it's goodness, but it also appears to be inconsistent with general handling of the implicit this parameter for structs and so it isn't clear to me that all the regressions are "correct".

I think the actual issue is just that the managed fallback is deferring to the static GetElement and so its forcing the entire 16-byte vector to dereference rather than directly accessing the relevant element (hence causing a nullref prior to the bounds check). Instead we should be doing the bounds check up front and then deferring to GetElementUnsafe to ensure the behavior is deterministic.

tannergooding · 2024-07-18T20:56:10Z

That is when DOTNET_EnableAVX=0 we're doing:

Vector256<float> tmp = this;
if (index < 0 || index >= Count)
    throw ArgumentOutOfRangeException();
return Unsafe.ReadUnaligned(ref Unsafe.Add(ref Unsafe.As<Vector256<float>, float>(ref tmp), index));

While when we are accelerated we're doing:

if (index < 0 || index >= Count)
    throw ArgumentOutOfRangeException();
return Unsafe.ReadUnaligned(ref Unsafe.Add(ref Unsafe.As<Vector256<float>, float>(ref this), index));

and therefore not doing the full dereference first. Not doing the full dereference is the more desirable/better behavior here, so we really just need to fix the managed fallback I think

jakobbotsch · 2024-07-18T21:07:46Z

@tannergooding The JIT is internally inconsistent because of this bug -- so this cannot just be fixed on the managed side. We have GetElement tree that looks like this:

N008 ( 11, 13) [000007] ---XG------                         └──▌  HWINTRINSIC float  float GetElement <l:$2c3, c:$2c2>
N002 (  3,  2) [000001] ---XG------                            ├──▌  IND       simd8  <l:$240, c:$241>
N001 (  1,  1) [000000] -----------                            │  └──▌  LCL_VAR   byref  V00 this         u:1 (last use) $80
N007 (  7, 10) [000006] ---X-------                            └──▌  COMMA     int    $280
N005 (  6,  9) [000005] ---X-------                               ├──▌  BOUNDS_CHECK_ArgRng void   $203
N003 (  1,  1) [000004] -----------                               │  ├──▌  LCL_VAR   int    V01 arg1         u:1 $c0
N004 (  1,  1) [000003] -----------                               │  └──▌  CNS_INT   int    2 $44
N006 (  1,  1) [000002] -----------                               └──▌  LCL_VAR   int    V01 arg1         u:1 (last use) $c0

This requires the indirection to be evaluated before the bounds check. The lowering transformation being fixed here is reordering the evaluation order of the operands. Alternatively this IR should not have been created like this in the first place, but that's still a fix somewhere else within the JIT.

#83005 (comment) is the original context.

tannergooding · 2024-07-18T21:14:42Z

I think the shape here is wrong in that we've produced COMMA(BOUNDS_CHECK_ArgRng, op2). This should've been COMMA(BOUNDS_CHECK_ArgRng, GetElement(op1, op2))

The bounds check is meant to prevent the intrinsic (and any operand evaluation) from executing. That is, it's meant to be the first thing done before we do anything else.

tannergooding · 2024-07-18T21:18:12Z

Although I guess that would still be potentially problematic with regards to non accelerated handling since GetElement takes Vector256<T> by value
and therefore it'd force the indirection to occur before get element happens

so I guess the real issue is we didn't spill op1 when inserting the bounds check during import

JIT: Fix LowerHWIntrinsicGetElement for xarch

654043c

Ensure we don't reorder evaluation and change exceptional behavior. Closes dotnet#89565.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 16, 2024

dotnet-policy-service bot assigned AndyAyersMS Jul 16, 2024

jakobbotsch reviewed Jul 16, 2024

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Jul 16, 2024

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Jul 16, 2024

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

jakobbotsch approved these changes Jul 16, 2024

View reviewed changes

EgorBo reviewed Jul 16, 2024

View reviewed changes

src/tests/JIT/Regression/JitBlue/Runtime_89565/Runtime_89595.cs Outdated Show resolved Hide resolved

AndyAyersMS and others added 2 commits July 16, 2024 11:35

Apply suggestions from code review

043195d

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

fix file names

b71b29a

format

2ee7b10

This was referenced Jul 17, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

The job running on agent NetCore-Public ran longer than the maximum time #104044

Closed

must un-contain addr if we spill it

c0e4f0f

AndyAyersMS merged commit 2d17105 into dotnet:main Jul 18, 2024
114 checks passed

github-actions bot locked and limited conversation to collaborators Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Fix LowerHWIntrinsicGetElement for xarch #104987

JIT: Fix LowerHWIntrinsicGetElement for xarch #104987

AndyAyersMS commented Jul 16, 2024

AndyAyersMS commented Jul 16, 2024

dotnet-policy-service bot commented Jul 16, 2024

jakobbotsch left a comment •

edited

Loading

AndyAyersMS commented Jul 16, 2024

AndyAyersMS commented Jul 16, 2024

AndyAyersMS commented Jul 17, 2024

AndyAyersMS commented Jul 17, 2024

tannergooding commented Jul 18, 2024 •

edited

Loading

AndyAyersMS commented Jul 18, 2024

tannergooding commented Jul 18, 2024

tannergooding commented Jul 18, 2024 •

edited

Loading

jakobbotsch commented Jul 18, 2024

tannergooding commented Jul 18, 2024

tannergooding commented Jul 18, 2024

JIT: Fix LowerHWIntrinsicGetElement for xarch #104987

JIT: Fix LowerHWIntrinsicGetElement for xarch #104987

Conversation

AndyAyersMS commented Jul 16, 2024

AndyAyersMS commented Jul 16, 2024

dotnet-policy-service bot commented Jul 16, 2024

jakobbotsch left a comment • edited Loading

Choose a reason for hiding this comment

AndyAyersMS commented Jul 16, 2024

AndyAyersMS commented Jul 16, 2024

AndyAyersMS commented Jul 17, 2024

AndyAyersMS commented Jul 17, 2024

tannergooding commented Jul 18, 2024 • edited Loading

AndyAyersMS commented Jul 18, 2024

tannergooding commented Jul 18, 2024

tannergooding commented Jul 18, 2024 • edited Loading

jakobbotsch commented Jul 18, 2024

tannergooding commented Jul 18, 2024

tannergooding commented Jul 18, 2024

jakobbotsch left a comment •

edited

Loading

tannergooding commented Jul 18, 2024 •

edited

Loading

tannergooding commented Jul 18, 2024 •

edited

Loading