-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can Unsafe.As do-not-enreg[SVB] ld-addr-op be elided on inlining? #10029
Comments
Took a quick look and am not really sure there is any easy way to improve on this. But I could be overlooking something. The use of
For structs with 3 or fewer fields the jit will promote even without seeing evidence of individual field accesses, but when there are 4 or more the field access evidence is needed. Changing the limit from 3 to 4 doesn't materially improve things as the struct copies don't get properly optimized even when the structs get promoted. We are not really prepared to handle a simd type with independently nameable sub-parts. So possibly further changes in the copy heuristic are needed to realize that simd copies can be made efficiently via block copy instead of field by field. And there may still be other issues beyond this. But perhaps it is viable. Or, possibly, there needs to be some higher-level logic in the jit that realizes that the Quaternion is isomorphic to a simd type and so can be treated as one over larger stretches of code so the conversions in and out of different isomorphic type forms don't manifest as actual copies. That sort of type propagation can be effective if the meat and potatoes operations on the more abstract type are all internally recast as simd. But if not (say only a few operations are converted) then this starts to verge into the territory of being able to automatically recognize simd operations in non-simd code, and that is not an easy road either. Maybe Carol has some other ideas... |
I was hoping to get the output looking more like what happens when you switch it for Vector4 where the stack isn't really used and everything stays in registers: ; Assembly listing for method Program:Normalize_Vector4():struct
; ...
; Lcl frame size = 0
G_M3011_IG01:
C5F877 vzeroupper
G_M3011_IG02:
C4E17A1005CC000000 vmovss xmm0, dword ptr [reloc @RWD00]
C4E17A100DC7000000 vmovss xmm1, dword ptr [reloc @RWD04]
C4E17A1015C2000000 vmovss xmm2, dword ptr [reloc @RWD08]
C4E17A101DBD000000 vmovss xmm3, dword ptr [reloc @RWD12]
C4E15857E4 vxorps xmm4, xmm4
C4E15A10E3 vmovss xmm4, xmm4, xmm3
C4E15973FC04 vpslldq xmm4, 4
C4E15A10E2 vmovss xmm4, xmm4, xmm2
C4E15973FC04 vpslldq xmm4, 4
C4E15A10E1 vmovss xmm4, xmm4, xmm1
C4E15973FC04 vpslldq xmm4, 4
C4E15A10E0 vmovss xmm4, xmm4, xmm0
C4E17828C4 vmovaps xmm0, xmm4
C4E17828C8 vmovaps xmm1, xmm0
C4E37140C8F1 vdpps xmm1, xmm0, 241
C4E17251C9 vsqrtss xmm1, xmm1
C4E27918C9 vbroadcastss xmm1, xmm1
C4E1785EC1 vdivps xmm0, xmm1
C4E17828C8 vmovaps xmm1, xmm0
C4E37140C8F1 vdpps xmm1, xmm0, 241
C4E17251C9 vsqrtss xmm1, xmm1
C4E27918C9 vbroadcastss xmm1, xmm1
C4E1785EC1 vdivps xmm0, xmm1
C4E17828C8 vmovaps xmm1, xmm0
C4E37140C8F1 vdpps xmm1, xmm0, 241
C4E17251C9 vsqrtss xmm1, xmm1
C4E27918C9 vbroadcastss xmm1, xmm1
C4E1785EC1 vdivps xmm0, xmm1
C4E17828C8 vmovaps xmm1, xmm0
C4E37140C8F1 vdpps xmm1, xmm0, 241
C4E17251C9 vsqrtss xmm1, xmm1
C4E27918C9 vbroadcastss xmm1, xmm1
C4E1785EC1 vdivps xmm0, xmm1
C4E1791101 vmovupd xmmword ptr [rcx], xmm0
488BC1 mov rax, rcx
G_M3011_IG03:
C3 ret
; Total bytes of code 200, prolog size 3 for method Program:Normalize_Vector4():struct |
Also in a pattern that worked for general structs rather than just the |
Apparently not reusing Vector4 q = Unsafe.As<Quaternion, Vector4>(ref value);
Vector4 r = Vector4.Normalize(q);
return Unsafe.As<Vector4, Quaternion>(ref r); I haven't investigated to see why is that. |
Codegen today is ; Assembly listing for method Program:Normalize_New():System.Numerics.Quaternion
; Emitting BLENDED_CODE for X64 with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 4 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 RetBuf [V00,T00] ( 4, 4 ) byref -> rcx single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) struct ( 0) [rsp+00H] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1 [V02 ] ( 0, 0 ) simd16 -> zero-ref ld-addr-op "NewObj constructor temp"
;* V03 tmp2 [V03 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument"
;* V04 tmp3 [V04 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument"
;* V05 tmp4 [V05 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument"
;* V06 tmp5 [V06 ] ( 0, 0 ) simd16 -> zero-ref ld-addr-op "Inlining Arg"
;* V07 tmp6 [V07,T04] ( 0, 0 ) simd16 -> zero-ref ld-addr-op "Inline stloc first use temp"
;* V08 tmp7 [V08 ] ( 0, 0 ) simd16 -> zero-ref ld-addr-op "Inlining Arg"
; V09 tmp8 [V09,T01] ( 4, 4 ) simd16 -> mm0 ld-addr-op "Inline stloc first use temp"
;* V10 tmp9 [V10 ] ( 0, 0 ) simd16 -> zero-ref ld-addr-op "Inlining Arg"
; V11 tmp10 [V11,T02] ( 4, 4 ) simd16 -> mm0 ld-addr-op "Inline stloc first use temp"
;* V12 tmp11 [V12 ] ( 0, 0 ) simd16 -> zero-ref ld-addr-op "Inlining Arg"
; V13 tmp12 [V13,T03] ( 4, 4 ) simd16 -> mm0 ld-addr-op "Inline stloc first use temp"
;
; Lcl frame size = 0
G_M50836_IG01: ;; offset=0000H
vzeroupper
;; size=3 bbWeight=1 PerfScore 1.00
G_M50836_IG02: ;; offset=0003H
vmovups xmm0, xmmword ptr [reloc @RWD00]
vdpps xmm0, xmm0, xmmword ptr [reloc @RWD00], -1
vsqrtps xmm0, xmm0
vmovups xmm1, xmmword ptr [reloc @RWD00]
vdivps xmm0, xmm1, xmm0
vdpps xmm1, xmm0, xmm0, -1
vsqrtps xmm1, xmm1
vdivps xmm0, xmm0, xmm1
vdpps xmm1, xmm0, xmm0, -1
vsqrtps xmm1, xmm1
vdivps xmm0, xmm0, xmm1
vdpps xmm1, xmm0, xmm0, -1
vsqrtps xmm1, xmm1
vdivps xmm0, xmm0, xmm1
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
;; size=83 bbWeight=1 PerfScore 140.25
G_M50836_IG03: ;; offset=0056H
ret
;; size=1 bbWeight=1 PerfScore 1.00
RWD00 dq 4116666641080000h, 3F8000003F99999Ah
; Total bytes of code 87, prolog size 3, PerfScore 150.95, instruction count 18, allocated bytes for code 87 (MethodHash=d479396b) for method Program:Normalize_New():System.Numerics.Quaternion
; ============================================================ |
Using
Unsafe.As(ref
causes variable to be marked asdo-not-enreg[SVB] ld-addr-op
.Can this be elided through inlining?
Example dotnet/corefx#25510 (comment)
Or is there a better approach? (e.g.
Unsafe.Read
requiresfixed
andUnsafe.ReadUnaligned
is a doubleref
also requiresUnsafe.As<Quaternion, byte>
)/cc @CarolEidt @AndyAyersMS
category:cq
theme:structs
skill-level:expert
cost:medium
impact:medium
The text was updated successfully, but these errors were encountered: