Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Improve x86 unsigned to floating cast codegen #111595

Merged
merged 5 commits into from
Jan 31, 2025

Conversation

saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Jan 19, 2025

Fixes #77658

This improves codegen mostly for unsigned to floating types but catches a few other redundant conversions.

  • Adds support for using AVX-512 vcvtusi2s[sd] for uint -> float/double (only ulong was handled previously) on both x64 and x86.

    -       mov      eax, edx
            vxorps   xmm0, xmm0, xmm0
    -       vcvtsi2sd xmm0, xmm0, rax
    -       vcvtsd2ss xmm0, xmm0, xmm0
    +       vcvtusi2ss xmm0, xmm0, edx
    -       mov      eax, dword ptr [rbp-0x04]
    -       mov      eax, eax
            vxorps   xmm0, xmm0, xmm0
    -       vcvtsi2sd xmm0, xmm0, rax
    +       vcvtusi2sd xmm0, xmm0, dword ptr [rbp-0x04]
    -       push     0
    -       push     eax
    -       call     CORINFO_HELP_LNG2DBL
    -       fstp     qword ptr [ebp-0x10]
    -       vmovsd   xmm0, qword ptr [ebp-0x10]
    -       vcvtsd2ss xmm0, xmm0, xmm0
    +       vxorps   xmm0, xmm0, xmm0
    +       vcvtusi2ss xmm0, xmm0, eax
  • Improves codegen for uint -> float conversions on x64 without AVX-512, removing the intermediate conversion to double.

            mov      eax, edi
            xorps    xmm0, xmm0
    -       cvtsi2sd xmm0, rax
    -       cvtsd2ss xmm0, xmm0
    +       cvtsi2ss xmm0, rax
  • Adds support for direct ulong -> float cast to the x64 SSE2 fallback, resolving a difference in behavior between hardware with AVX-512 vs without, and saving an extra cvtsd2ss instruction.

            xorps    xmm0, xmm0
            mov      rax, rdi
            shr      rax, 1
            mov      rsi, edi
            and      rsi, 1
            or       rsi, rax
            test     rdi, rdi
            cmovns   rsi, rdi
    -       cvtsi2sd xmm0, rsi
    +       cvtsi2ss xmm0, rsi
            jns      SHORT G_M37561_IG56
    -       addsd    xmm0, xmm0
    +       addss    xmm0, xmm0
     G_M37561_IG56:
    -       cvtsd2ss xmm0, xmm0
  • Removes some redundant float -> double -> float casts.

            vmulss   xmm1, xmm1, dword ptr [@RWD00]
    -       vcvtss2sd xmm1, xmm1, xmm1
    -       vcvtsd2ss xmm1, xmm1, xmm1
            vbroadcastss xmm1, xmm1

SPMI Diffs

The only code size regressions are the insertion of xorps to clear the upper elements of the target reg for the AVX-512 unsigned conversion instructions. These were previously omitted but should have been there since the unsigned conversions have the same behavior as the signed (i.e. preserving/copying upper elements) and are subject to the same false dependency penalties.

GCC emits the xorps for all conversions; Clang skips it for all conversions in simple examples but may emit it in more complex scenarios.
https://godbolt.org/z/6aY7fdE3d

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 19, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 19, 2025
@saucecontrol
Copy link
Member Author

@MihuBot

@saucecontrol saucecontrol marked this pull request as ready for review January 19, 2025 22:17
@saucecontrol saucecontrol changed the title JIT: Improve x86 integral to floating cast codegen JIT: Improve x86 unsigned to floating cast codegen Jan 19, 2025
@saucecontrol
Copy link
Member Author

cc @dotnet/jit-contrib this is ready for review

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC. @dotnet/jit-contrib for secondary review

@BruceForstall BruceForstall merged commit fa0f65c into dotnet:main Jan 31, 2025
118 checks passed
@saucecontrol saucecontrol deleted the unsigned-float-cast branch January 31, 2025 19:15
grendello added a commit to grendello/runtime that referenced this pull request Feb 3, 2025
* main:
  System.Net.Http.WinHttpHandler.StartRequestAsync assertion failed (dotnet#109799)
  Keep test PDB in helix payload for native AOT (dotnet#111949)
  Build the RID-specific System.IO.Ports packages in the VMR (dotnet#112054)
  Always inline number conversions (dotnet#112061)
  Use Contains{Any} in Regex source generator (dotnet#112065)
  Update dependencies from https://github.com/dotnet/arcade build 20250130.5 (dotnet#112013)
  JIT: Transform single-reg args to FIELD_LIST in physical promotion (dotnet#111590)
  Ensure that math calls into the CRT are tracked as needing vzeroupper (dotnet#112011)
  Use double.ConvertToIntegerNative where safe to do in System.Random (dotnet#112046)
  JIT: Compute `fgCalledCount` after synthesis (dotnet#112041)
  Simplify boolean logic in `TimeZoneInfo` (dotnet#112062)
  JIT: Update type when return temp is freshly created (dotnet#111948)
  Remove unused build controls and simplify DotNetBuild.props (dotnet#111986)
  Fix case-insensitive JSON deserialization of enum member names (dotnet#112028)
  WasmAppBuilder: Remove double computation of a value (dotnet#112047)
  Disable LTCG for brotli and zlibng. (dotnet#111805)
  JIT: Improve x86 unsigned to floating cast codegen (dotnet#111595)
  simplify x86 special intrinsic imports (dotnet#111836)
  JIT: Try to retain entry weight during profile synthesis (dotnet#111971)
  Fix explicit offset of ByRefLike fields. (dotnet#111584)
@amanasifkhalid
Copy link
Member

amanasifkhalid commented Feb 10, 2025

@saucecontrol we're seeing some test failures (#112324, #112325, #112329) in our stress pipelines on x64 related to floating-point casts. Do those look related to this PR?

@saucecontrol
Copy link
Member Author

Yeah, if it's only showing up under JitStressRegs, it's likely bad codegen from an existing bug this PR exposed by changing float->double->float casts to float->float. In which case, #112217 should resolve it.

@saucecontrol
Copy link
Member Author

Actually, not so sure about that last one. This PR changed integral->floating casts, but those failures are on floating->integral, which I don't think has changed recently

@amanasifkhalid
Copy link
Member

Actually, not so sure about that last one. This PR changed integral->floating casts, but those failures are on floating->integral, which I don't think has changed recently

Thanks for confirming; I'll wait to see if #112217 resolves them...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance: JIT is emitting multiple conversion instructions when using float-math
4 participants