-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Fix LowerHWIntrinsicGetElement for xarch #104987
Conversation
Ensure we don't reorder evaluation and change exceptional behavior. Closes dotnet#89565.
@jakobbotsch PTAL |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise. Were the diffs bad if we just skipped the optimization when we cannot move the indir?
src/tests/JIT/Regression/JitBlue/Runtime_89565/Runtime_89595.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
I don't recall -- let me look. |
From SPMI the diffs are somewhat bigger, eg in ASP.NET:
There are only a few diffs in non |
NAOT failure looks to be related:
|
Ah, other failures too. |
@AndyAyersMS, this seems to have inserted quite a few null checks for the implicit ; byrRegs +[rcx]
+ cmp byte ptr [rcx], cl
cmp edx, 4
jae SHORT G_M63163_IG04
vmovsd xmm0, qword ptr [rcx+8*rdx]
- ;; size=10 bbWeight=1 PerfScore 5.25
+ ;; size=12 bbWeight=1 PerfScore 8.25 Is that expected? We normally don't see such null checks (see for example: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACXDKAVzAwYGUGBvGhoIYATCK2AAbGAxgAGANwCho8VJnlF1IQ3oixk6QHEYGAJoAKAJQMAvAD51mgL5A==) |
That's kind of the point. It adds a handful of nullchecks in places where the JIT doesn't or can't prove the vector address won't be null, since the dereference happens before the element access. |
It looks to be generally inconsistent with how we handle it for spans and other cases: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAMIR8AB14AbGFADKMgG68wMXAG4axAMwNcGKBzAYGshgG8aDKwwAmEDsGkMYdDdWu37jmM/JuPltYA2gCyMBgAFhA2AJLikgAUYZHRcWKSAPJifBBcuCwAchAxXJK8XOUA5gCUALqBVtqeDk4A4uEAGgnVDAC8AHwMXDAA7iZi2FwAPHYtMP0JsABmznTVQXS1/tYNTDqz3gztGACa3X2Dw2OyE9MH0gvLvuubbgC+NJo6ZAxCNBbuaxNbi4bBLHz3HxdWQAKgYYh6A3hcH6xy61W2Vl2wLyYIhXicZ1h8MRgzEKOOZwxNDeQA= I understand the intent of doing null checks here and think that in a case like I think the actual issue is just that the managed fallback is deferring to the static |
That is when Vector256<float> tmp = this;
if (index < 0 || index >= Count)
throw ArgumentOutOfRangeException();
return Unsafe.ReadUnaligned(ref Unsafe.Add(ref Unsafe.As<Vector256<float>, float>(ref tmp), index)); While when we are accelerated we're doing: if (index < 0 || index >= Count)
throw ArgumentOutOfRangeException();
return Unsafe.ReadUnaligned(ref Unsafe.Add(ref Unsafe.As<Vector256<float>, float>(ref this), index)); and therefore not doing the full dereference first. Not doing the full dereference is the more desirable/better behavior here, so we really just need to fix the managed fallback I think |
@tannergooding The JIT is internally inconsistent because of this bug -- so this cannot just be fixed on the managed side. We have N008 ( 11, 13) [000007] ---XG------ └──▌ HWINTRINSIC float float GetElement <l:$2c3, c:$2c2>
N002 ( 3, 2) [000001] ---XG------ ├──▌ IND simd8 <l:$240, c:$241>
N001 ( 1, 1) [000000] ----------- │ └──▌ LCL_VAR byref V00 this u:1 (last use) $80
N007 ( 7, 10) [000006] ---X------- └──▌ COMMA int $280
N005 ( 6, 9) [000005] ---X------- ├──▌ BOUNDS_CHECK_ArgRng void $203
N003 ( 1, 1) [000004] ----------- │ ├──▌ LCL_VAR int V01 arg1 u:1 $c0
N004 ( 1, 1) [000003] ----------- │ └──▌ CNS_INT int 2 $44
N006 ( 1, 1) [000002] ----------- └──▌ LCL_VAR int V01 arg1 u:1 (last use) $c0 This requires the indirection to be evaluated before the bounds check. The lowering transformation being fixed here is reordering the evaluation order of the operands. Alternatively this IR should not have been created like this in the first place, but that's still a fix somewhere else within the JIT. #83005 (comment) is the original context. |
I think the shape here is wrong in that we've produced The bounds check is meant to prevent the intrinsic (and any operand evaluation) from executing. That is, it's meant to be the first thing done before we do anything else. |
Although I guess that would still be potentially problematic with regards to non accelerated handling since so I guess the real issue is we didn't spill op1 when inserting the bounds check during import |
Ensure we don't reorder evaluation and change exceptional behavior.
Closes #89565.