[Perf] Windows/x64: Decimal Regressions on 2/3/2025 6:32:46 PM +00:00 #112432

performanceautofiler · 2025-02-11T08:07:39Z

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.22631
Queue	ViperWindows
Baseline	ad382caba7c0279af5f2b233847371eadfb42f5c
Compare	05d94d94028b5622b19734e0e1b60d7aca4667b3
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in LinqBenchmarks

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector
Order00ManualX - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	81.55 ms	87.49 ms	1.07	0.03	False
Order00LinqQueryX - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	40.51 ms	43.03 ms	1.06	0.01	False
Order00LinqMethodX - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	40.51 ms	43.10 ms	1.06	0.00	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'LinqBenchmarks*'

LinqBenchmarks.Order00ManualX

ETL Files

Histogram

JIT Disasms

LinqBenchmarks.Order00LinqQueryX

ETL Files

Histogram

JIT Disasms

LinqBenchmarks.Order00LinqMethodX

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

LoopedBard3 · 2025-02-11T17:46:06Z

Due to #99212 (These benchmarks use Decimal). From range: b2880ae...cd1657d. FYI @tannergooding

Related Regressions:

[Perf] Linux/x64: 1 Regression on 2/3/2025 6:32:46 PM +00:00 perf-autofiling-issues#49895
[Perf] Windows/x64: 1 Regression on 2/3/2025 6:32:46 PM +00:00 perf-autofiling-issues#49912

dotnet-policy-service · 2025-02-12T17:51:48Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

tannergooding · 2025-02-12T18:49:30Z

CC. @Daniel-Svensson given the official benchmarks are showing regressions we may need to look at reverting.

There are also some improvements (linked to from the bottom of #99212), so some further analysis might be desirable.

Is that something you have the capacity to look at?

Daniel-Svensson · 2025-02-19T09:08:37Z

I can try to have a quick look, but it will take a week or two before I have the time and energy.

I had a quick look at the diff and it nothing stood out for the compare (add/sub) case.
It might be related to the BigMul change which now uses MulX that don't generate as good code as the normal multiply, depending on distribution of number scaling.
If that is the case only the change in Add/Sub needs to be reverted.

There are also some improvements (linked to from the bottom of #99212), so some further analysis might be desirable.

With the improvements, do you refer to the switch statement or the list of performance improvements from the PR Summary under "Remarks for better performance" (add with carry, MultiplyNoFlags and ShiftLeft128) ?

I do not think I have the capacity to fix any of the instrincts at the moment, but if someone does then I will gladly help to apply them to the decimal code. (I think add/sub with carry made the c++ compare/add/sub code 4-10% faster with a lot less branches if I remember correctly)

tannergooding · 2025-02-19T15:51:29Z

With the improvements, do you refer to the switch statement or the list of performance improvements from the PR Summary under "Remarks for better performance" (add with carry, MultiplyNoFlags and ShiftLeft128) ?

This list at the very bottom of the PR:

with a lot less branches if I remember correctly

Notably, less branches does not mean faster code. This is dependent on the work being done and the predictability of the branches. The cost of a predicted branch is effectively 0-1 cycles, while the cost of a mispredicted branch is effectively 20 cycles.

Micro benchmarks tend to perform better with branches due to typically testing the same input over and over. What is best for real world code tends to be workload specific.

The actual perf regression here look to be related to CompareTo which ends up calling VarDecCmpSub, so this is most likely related to this switch, potentially due to the out parameter or similar:

-                        ulong tmpLow = Math.BigMul((uint)low64, power);
-                        ulong tmp = Math.BigMul((uint)(low64 >> 32), power) + (tmpLow >> 32);
-                        low64 = (uint)tmpLow + (tmp << 32);
-                        tmp >>= 32;
+                        ulong tmp = Math.BigMul(low64, power, out low64);

performanceautofiler bot added arch-x64 os-windows PGO runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Feb 11, 2025

performanceautofiler bot mentioned this issue Feb 11, 2025

[SENTINEL] Autofile run complete at 2/11/2025 8:13:45 AM +00:00. 12 issues filed. dotnet/perf-autofiling-issues#49945

Closed

LoopedBard3 removed the PGO label Feb 11, 2025

LoopedBard3 transferred this issue from dotnet/perf-autofiling-issues Feb 11, 2025

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 11, 2025

LoopedBard3 changed the title ~~[Perf] Windows/x64: 3 Regressions on 2/3/2025 6:32:46 PM +00:00~~ [Perf] Windows/x64: Decimal Regressions on 2/3/2025 6:32:46 PM +00:00 Feb 11, 2025

LoopedBard3 added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Feb 11, 2025

jeffschwMSFT added the area-System.Numerics label Feb 12, 2025

vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Windows/x64: Decimal Regressions on 2/3/2025 6:32:46 PM +00:00 #112432

[Perf] Windows/x64: Decimal Regressions on 2/3/2025 6:32:46 PM +00:00 #112432

performanceautofiler bot commented Feb 11, 2025

LinqBenchmarks.Order00ManualX

ETL Files

Histogram

JIT Disasms

LinqBenchmarks.Order00LinqQueryX

ETL Files

Histogram

JIT Disasms

LinqBenchmarks.Order00LinqMethodX

ETL Files

Histogram

JIT Disasms

Docs

LoopedBard3 commented Feb 11, 2025 •

edited

Loading

dotnet-policy-service bot commented Feb 12, 2025

tannergooding commented Feb 12, 2025 •

edited

Loading

Daniel-Svensson commented Feb 19, 2025

tannergooding commented Feb 19, 2025

[Perf] Windows/x64: Decimal Regressions on 2/3/2025 6:32:46 PM +00:00 #112432

[Perf] Windows/x64: Decimal Regressions on 2/3/2025 6:32:46 PM +00:00 #112432

Comments

performanceautofiler bot commented Feb 11, 2025

Run Information

Regressions in LinqBenchmarks

Repro

LinqBenchmarks.Order00ManualX

ETL Files

Histogram

JIT Disasms

LinqBenchmarks.Order00LinqQueryX

ETL Files

Histogram

JIT Disasms

LinqBenchmarks.Order00LinqMethodX

ETL Files

Histogram

JIT Disasms

Docs

LoopedBard3 commented Feb 11, 2025 • edited Loading

dotnet-policy-service bot commented Feb 12, 2025

tannergooding commented Feb 12, 2025 • edited Loading

Daniel-Svensson commented Feb 19, 2025

tannergooding commented Feb 19, 2025

LoopedBard3 commented Feb 11, 2025 •

edited

Loading

tannergooding commented Feb 12, 2025 •

edited

Loading