-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct getters are generating unneccessary instructions on x64 when struct contains floats #7200
Comments
Yep, accessing a field via a property naturally results in a temporary copy and the JIT can't always eliminate that. See #4323 as well. |
@dotnet/jit-contrib |
@sandreenko, @CarolEidt. I took a look at this as I found a codegen issue around simple struct wrappers. For example: C5FA11442440 vmovss dword ptr [rsp+40H], xmm0
8B442440 mov eax, dword ptr [rsp+40H]
89442460 mov dword ptr [rsp+60H], eax
8B442460 mov eax, dword ptr [rsp+60H]
89442438 mov dword ptr [rsp+38H], eax
C5F259442438 vmulss xmm0, xmm1, dword ptr [rsp+38H]
C5FA11442430 vmovss dword ptr [rsp+30H], xmm0
8B442430 mov eax, dword ptr [rsp+30H]
89442458 mov dword ptr [rsp+58H], eax
8B442458 mov eax, dword ptr [rsp+58H]
89442428 mov dword ptr [rsp+28H], eax
C5EA58442428 vaddss xmm0, xmm2, dword ptr [rsp+28H]
C5FA11442420 vmovss dword ptr [rsp+20H], xmm0
8B442420 mov eax, dword ptr [rsp+20H]
89442450 mov dword ptr [rsp+50H], eax
8B442450 mov eax, dword ptr [rsp+50H]
89442418 mov dword ptr [rsp+18H], eax
C5FA10442418 vmovss xmm0, dword ptr [rsp+18H]
C5FA5EC3 vdivss xmm0, xmm0, xmm3
C5FA11442410 vmovss dword ptr [rsp+10H], xmm0
8B442410 mov eax, dword ptr [rsp+10H]
89442448 mov dword ptr [rsp+48H], eax
8B442448 mov eax, dword ptr [rsp+48H]
89442408 mov dword ptr [rsp+08H], eax
C5FA10442408 vmovss xmm0, dword ptr [rsp+08H] We currently block struct promotion of structs with a single floating-point field: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/lclvars.cpp#L1910-L1925 C5FA59C1 vmulss xmm0, xmm0, xmm1
C5FA58C2 vaddss xmm0, xmm0, xmm2
C5FA5EC3 vdivss xmm0, xmm0, xmm3 For the case where things aren't inlined and where the wrapper struct is either a The first is that genFnPrologCalleeRegArgs makes some assumptions that the
The other issue is that the register allocator doesn't currently expect for floating-point values to be in integer registers so we will get assets in places like: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/lsrabuild.cpp#L599 because
The reverse of going from a |
@tannergooding thanks for the analysis, I mostly agree with you conclusions, a few comments
I would not want to retype the struct as its only field and fix it in LSRA, I hope to delete all code under |
@sandreenko, I think I might actually be missing something here.... When we do struct promotion, we create a When the parent struct is an argument or return type, then it might start out in a register that doesn't "match" the LclVar type and that is when the oddness starts. So, in the above scenario you might have something in |
@tannergooding - the retyping currently happens in order to work around the register file mismatches. But it causes may problems of its own. So once we've got the struct handling the way we'd like it to be, for your Other cases, where you may still have references to the entire struct, those would be handled in
The IR after
The More complex cases (multiple fields that don't match the registers) that require extracting fields, will not be supported right away, but shouldn't be too hard once we get the next couple of rounds of changes in. |
While the |
Suspect we won't get to this in 7.0, so moving to future. @sandreenko should we unassign you? |
Codegen today on main: ; Assembly listing for method Program:TestFieldSetters(TestClass,TestStruct)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) ref -> rcx class-hnd single-def
; V01 arg1 [V01,T01] ( 3, 3 ) struct ( 8) rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M46470_IG01:
;; size=0 bbWeight=1 PerfScore 0.00
G_M46470_IG02:
mov dword ptr [rcx+08H], edx
;; size=3 bbWeight=1 PerfScore 1.00
G_M46470_IG03:
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 4, prolog size 0, PerfScore 2.40, instruction count 2, allocated bytes for code 4 (MethodHash=8e384a79) for method Program:TestFieldSetters(TestClass,TestStruct)
; ============================================================ ; Assembly listing for method Program:TestPropSetters(TestClass,TestStruct)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) ref -> rcx class-hnd single-def
; V01 arg1 [V01,T01] ( 3, 3 ) struct ( 8) rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;* V03 tmp1 [V03 ] ( 0, 0 ) struct ( 8) zero-ref "Inlining Arg"
;
; Lcl frame size = 0
G_M59705_IG01:
;; size=0 bbWeight=1 PerfScore 0.00
G_M59705_IG02:
mov dword ptr [rcx+0CH], edx
;; size=3 bbWeight=1 PerfScore 1.00
G_M59705_IG03:
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 4, prolog size 0, PerfScore 2.40, instruction count 2, allocated bytes for code 4 (MethodHash=ccab16c6) for method Program:TestPropSetters(TestClass,TestStruct)
; ============================================================ ; Assembly listing for method Program:TestFieldGetters(TestClass)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) ref -> rcx class-hnd single-def
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 40
G_M35359_IG01:
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M35359_IG02:
mov ecx, dword ptr [rcx+08H]
call [Program:Print(TestStruct)]
nop
;; size=10 bbWeight=1 PerfScore 5.25
G_M35359_IG03:
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code 19, prolog size 4, PerfScore 8.65, instruction count 6, allocated bytes for code 19 (MethodHash=59bc75e0) for method Program:TestFieldGetters(TestClass)
; ============================================================ ; Assembly listing for method Program:TestPropGetters(TestClass)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) ref -> rcx class-hnd single-def
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T01] ( 2, 4 ) struct ( 8) [rsp+20H] do-not-enreg[S] "struct address for call/obj"
;
; Lcl frame size = 40
G_M21632_IG01:
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M21632_IG02:
mov ecx, dword ptr [rcx+0CH]
mov dword ptr [rsp+20H], ecx
mov ecx, dword ptr [rsp+20H]
call [Program:Print(TestStruct)]
nop
;; size=18 bbWeight=1 PerfScore 7.25
G_M21632_IG03:
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code 27, prolog size 4, PerfScore 11.45, instruction count 8, allocated bytes for code 27 (MethodHash=a0acab7f) for method Program:TestPropGetters(TestClass)
; ============================================================ Only |
Hi folks,
I hope I'm writing this in the right area. I was comparing the JIT outputs of properties vs. fields, to see whether I should avoid auto-properties in high-performance code. Now, I've managed to create an example where the exact same data embedded in a struct creates different output than just using a primitive type. To be exact, it looks like the entire struct is copied once more than neccessary. This only happens when a float is embedded, and not for ints.
The behaviour when using float fields or properties directly seems to be as expected (no additional instructions when using properties).
I'm using VS2017 RC, .NET 4.6.2, compiling to x64, I think it should be RyuJIT, although I don't know how to find that out. The disassemblies were obtained via Alt+G.
Example code:
Disassembly:
With struct TestStruct { public float Value; }
With struct TestStruct { public int Value; }
Full source:
category:cq
theme:structs
skill-level:expert
cost:medium
impact:medium
The text was updated successfully, but these errors were encountered: