-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sub-optimal codegen for incrementing (some) references #12880
Comments
@dotnet/jit-contrib |
Yes, codegen for |
Actually the jit copes with the constant
and that's what we end up generating code for. Ideally the jit would notice that We don't see this for the span case because there's no byref created (so no need for an early null check) and no extra offset to add in to get to the elements. Since fixing this seems somewhat involved, the "future" milestone seems appropriate. |
This is something |
And FWIW the trivial forward substition I have in mikedn/coreclr@a19eaa7 does handle this particular case of broken trees: G_M19078_IG01:
0F1F440000 nop
G_M19078_IG02:
3909 cmp dword ptr [rcx], ecx
0FB6410C movzx rax, byte ptr [rcx+12]
G_M19078_IG03:
C3 ret But that needs more work and I'm far from convinced that such trivial cases need to be handled so late during compilation. We should try not to break trees to begin with... |
Possibly related to this issue, I've noticed another weird detail in the codegen for a very similar extension method to the one in the first message in this issue. Consider this code (I'm assuming to be running on .NET Core 2.1/3.1 or .NET Native, this could fail on eg. Mono due to different memory layout): public static class ArrayExtensions
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static ref float DangerousGetReferenceAt(this float[] array, int i)
{
var arrayData = Unsafe.As<RawArrayData>(array);
ref float r0 = ref Unsafe.As<byte, float>(ref arrayData.Data);
ref float ri = ref Unsafe.Add(ref r0, i);
return ref ri;
}
[StructLayout(LayoutKind.Sequential)]
private sealed class RawArrayData
{
public IntPtr Length;
public byte Data;
}
} According to sharplab.io (here), this yields: ArrayExtensions.DangerousGetReferenceAt(Single[], Int32)
L0000: cmp [rcx], ecx
L0002: add rcx, 0x10
L0006: movsxd rax, edx
L0009: lea rax, [rcx+rax*4]
L000d: ret That first ArrayExtensions.DangerousGetReferenceAt(Single[], Int32)
L0000: movsxd rax, edx
L0003: lea rax, [rcx+rax*4+0x10]
L0009: ret Same thing happens when the extension is generic and it's being used to actually load a value: I would've expected that Hope this helps, let me know if this is a known issue as well or if I should open a separate issue. Thanks! 😄 cc. @tannergooding in case you're interested in following up on this. |
All the cases here generate ideal codegen today: ; Assembly listing for method Test2:Get1[int](int[]):ubyte (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) ref -> rcx class-hnd single-def <int[]>
;# V01 OutArgs [V01 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M46567_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M46567_IG02: ;; offset=0x0000
movzx rax, byte ptr [rcx+0x08]
;; size=4 bbWeight=1 PerfScore 2.00
G_M46567_IG03: ;; offset=0x0004
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 5, prolog size 0, PerfScore 3.00, instruction count 2, allocated bytes for code 5 (MethodHash=3dc24a18) for method Test2:Get1[int](int[]):ubyte (FullOpts)
; ============================================================
; Assembly listing for method Test2:Get2[int](int[]):ubyte (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) ref -> rcx class-hnd single-def <int[]>
;# V01 OutArgs [V01 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M50340_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M50340_IG02: ;; offset=0x0000
movzx rax, byte ptr [rcx+0x0C]
;; size=4 bbWeight=1 PerfScore 2.00
G_M50340_IG03: ;; offset=0x0004
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 5, prolog size 0, PerfScore 3.00, instruction count 2, allocated bytes for code 5 (MethodHash=02df3b5b) for method Test2:Get2[int](int[]):ubyte (FullOpts)
; ============================================================
; Assembly listing for method Test:GetDangerous[int](int[],int):int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 4 ) ref -> rcx class-hnd single-def <int[]>
; V01 arg1 [V01,T01] ( 3, 3 ) int -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M29253_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M29253_IG02: ;; offset=0x0000
cmp byte ptr [rcx], cl
movsxd rax, edx
mov eax, dword ptr [rcx+4*rax+0x10]
;; size=9 bbWeight=1 PerfScore 5.25
G_M29253_IG03: ;; offset=0x0009
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 10, prolog size 0, PerfScore 6.25, instruction count 4, allocated bytes for code 10 (MethodHash=6f148dba) for method Test:GetDangerous[int](int[],int):int (FullOpts)
; ============================================================
; Assembly listing for method Test:GetDangerous[int](System.Span`1[int],int):int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 6 ) byref -> rcx single-def
; V01 arg1 [V01,T01] ( 3, 3 ) int -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V03 tmp1 [V03 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" <System.Span`1[int]>
;* V04 tmp2 [V04 ] ( 0, 0 ) byref -> zero-ref "field V00._reference (fldOffset=0x0)" P-INDEP
;* V05 tmp3 [V05 ] ( 0, 0 ) int -> zero-ref "field V00._length (fldOffset=0x8)" P-INDEP
; V06 tmp4 [V06,T02] ( 2, 2 ) byref -> rax single-def "field V03._reference (fldOffset=0x0)" P-INDEP
;* V07 tmp5 [V07 ] ( 0, 0 ) int -> zero-ref "field V03._length (fldOffset=0x8)" P-INDEP
;* V08 tmp6 [V08 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref" <System.Span`1[int]>
;
; Lcl frame size = 0
G_M39219_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M39219_IG02: ;; offset=0x0000
mov rax, bword ptr [rcx]
movsxd rcx, edx
mov eax, dword ptr [rax+4*rcx]
;; size=9 bbWeight=1 PerfScore 4.25
G_M39219_IG03: ;; offset=0x0009
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 10, prolog size 0, PerfScore 5.25, instruction count 4, allocated bytes for code 10 (MethodHash=92f066cc) for method Test:GetDangerous[int](System.Span`1[int],int):int (FullOpts)
; ============================================================
; Assembly listing for method ArrayExtensions:DangerousGetReferenceAt(float[],int):byref (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 4 ) ref -> rcx class-hnd single-def <float[]>
; V01 arg1 [V01,T01] ( 3, 3 ) int -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M22385_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M22385_IG02: ;; offset=0x0000
cmp byte ptr [rcx], cl
movsxd rax, edx
lea rax, bword ptr [rcx+4*rax+0x10]
;; size=10 bbWeight=1 PerfScore 4.25
G_M22385_IG03: ;; offset=0x000A
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 11, prolog size 0, PerfScore 5.25, instruction count 4, allocated bytes for code 11 (MethodHash=cbc7a88e) for method ArrayExtensions:DangerousGetReferenceAt(float[],int):byref (FullOpts)
; ============================================================ |
Am I correct if I think the codegen for Get2 method is not ideal? Mainly I don't understand the presence of cmp instruction. And the add instruction could be reduced to movzx offset, no?
For x64 Core / Release:
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAMIR8AB14AbGFADKMgG68wMXAG4a9Jq049+ggJI8ZEMfKhKV6zY1kALbFDEAZbMHbc+AjdU0BmbSQmUgYAFVUMGgBvGgY4hgBtACleDABxGC4ZZQAKDABPMRgIADMc3h4ASkqAXVj44gDmIOB8jBgGDIxyAB5QgD48u15cMISahkcobHzK+riY6njlpgB2BgBVLlxsEsEAQVwegFEEMQhcGUGpmcqWACE2mDofZYBfeYZP5NSMrKhcgUiqVylVap9GoEGK12p0YBhSH1BhhhqNQuNJlBprNPosVg11lsdnsWPsACZkx7tADyJRKlwwOVgJU2212ByOp3Olyg1yxtweTzoaAYOSMGAAChgoJUUJVXvEPtQlTQErJpRwwBhXPkIBxGTq9RgANIVMksLmSZSpcHUCrtKBcbCSYIMLkXGTRb4AMV4MEkZNp9PhOTotuWkJhHSpzx8byAA==
What I'm trying to achieve is all checks free array access. I have extension method for Span and T[]. I'm satisfied with codegen for Span variant but for array I think it could be improved.
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAMIR8AB14AbGFADKMgG68wMXAG4a9Jq049+ggJI8ZEMfKhKV6zY1kALbFDEAZbMHbc+AjdU0BmbSQmUgYAUQQMGgBvGgY4hgBtACleDABxGC4ZZQAKDABPMRgIADMc3h4ASkqAXVj4hIBZGAw7CAATA3FJHObWjq6xSQB5MT4ILlwWAEEAc1nYXFxeBRgjSQqK2dr6uOIA5iCAFQYMjAARbC5Zkw5cAB4jgD48u15cBiOEmoZHKGx8mgGBUMMDKrsGDFqPEYUwAOwMACqk2wJUE03a7RyENh8VgJSRKLRMwewHyGBgQOe2OhuLpcXxhNwqPRmIAQuSYMMSiVcC0afTBXiYATkczidMHuExBA+VAXn8AZUWByKXQ0DihbCckYMAAFDBQSpHQowS4YbAzKD/fIAMV4UFwGAMFPwqq5PL5GGqGtpQt4lR8MIAvhCIclUhkslBcgUiqVylUdn7En02p1ur0WunBiMxrwJlM5gtVMtVutNtdkzD9oFPqcWpdrrcHtTWu8GLIxFdHk8GLhu1wgSCwRCofTiAixSyZpicozmvhoPlGo5cA5JCwzmwRTJMiocgOrpVh4GIaHqBeaCCZFxsJJgp9TebsNEIQB6d/2RwuNwMdoQKoDBcBAoK4BwYgylAoKHMCXDKLw968AAXtg4xcBCtaQJMoIjtM1oAvajrOq67rcryLQMAAvAwAAcPhXtQCSyIaHBgBgrj5BAHAYDknHcRgADSFTtCw0obGAqTJjeUB3g+ZBhAgMpym+KYJPaMCSO05FejkdDVvEtZkhSDDunQDFAA=
Or is there any simpler way to achieve this?
Thanks!
category:cq
theme:basic-cq
skill-level:expert
cost:medium
impact:small
The text was updated successfully, but these errors were encountered: