Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] ARM64 regression in System.Diagnostics.Perf_Activity.ActivityAllocations #68624

Closed
performanceautofiler bot opened this issue Apr 21, 2022 · 20 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Architecture arm64
OS ubuntu 18.04
Baseline b58d8013dbc4f3a65e5e2242fe09a43f455a329f
Compare 84b664a9875067a97fe32b654ff32f23f4001a60
Diff Diff

Regressions in System.Diagnostics.Perf_Activity

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
ActivityAllocations - Duration of single invocation 758.56 ns 813.42 ns 1.07 0.20 False
ActivityAllocations - Duration of single invocation 623.66 ns 715.80 ns 1.15 0.11 False

graph
graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Diagnostics.Perf_Activity*'

Payloads

Baseline
Compare

Histogram

System.Diagnostics.Perf_Activity.ActivityAllocations(idFormat: W3C)


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 813.4234915083607 > 802.4856996133994.
IsChangePoint: Marked as a change because one of 4/13/2022 10:02:29 PM, 4/19/2022 4:53:57 AM falls between 4/10/2022 3:30:35 PM and 4/19/2022 4:53:57 AM.
IsRegressionStdDev: Marked as regression because -6.738201722219294 (T) = (0 -836.9530328425158) / Math.Sqrt((284.7705405371702 / (23)) + (1754.8085293566023 / (18))) is less than -2.0226909200346674 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (23) + (18) - 2, .025) and -0.09216640816227123 = (766.323727398658 - 836.9530328425158) / 766.323727398658 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

```#### System.Diagnostics.Perf_Activity.ActivityAllocations(idFormat: Hierarchical)

```log

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 715.7983124923339 > 655.6405660025653.
IsChangePoint: Marked as a change because one of 4/13/2022 4:27:35 PM, 4/19/2022 4:53:57 AM falls between 4/10/2022 3:30:35 PM and 4/19/2022 4:53:57 AM.
IsRegressionStdDev: Marked as regression because -11.823299540885158 (T) = (0 -708.1518629941236) / Math.Sqrt((243.4064317595605 / (20)) + (499.0656483304033 / (18))) is less than -2.028094000977961 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (20) + (18) - 2, .025) and -0.11789003363534732 = (633.4718457872224 - 708.1518629941236) / 633.4718457872224 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arm64 untriaged New issue has not been triaged by the area owner labels Apr 21, 2022
@AndyAyersMS
Copy link
Member

Possibly from #67920.

@AndyAyersMS AndyAyersMS added Look Again and removed untriaged New issue has not been triaged by the area owner labels Apr 22, 2022
@AndyAyersMS
Copy link
Member

@tarekgh thinks #67920 is not the likely suspect.

Another possibility given that this is arm64 is #66407. SPMI Diffs show there was an impact to System.Diagnostics.Tracing.EventListener:AddEventSource though it was a code size reduction.

The regression has persisted since then, so it does not look like noise.

newplot - 2022-04-27T143223 490

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 27, 2022
@AndyAyersMS AndyAyersMS transferred this issue from dotnet/perf-autofiling-issues Apr 27, 2022
@AndyAyersMS AndyAyersMS changed the title [Perf] Changes at 4/14/2022 3:36:10 AM [Perf] ARM64 regression in System.Diagnostics.Perf_Activity.ActivityAllocations Apr 27, 2022
@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 27, 2022
@ghost
Copy link

ghost commented Apr 27, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture arm64
OS ubuntu 18.04
Baseline b58d8013dbc4f3a65e5e2242fe09a43f455a329f
Compare 84b664a9875067a97fe32b654ff32f23f4001a60
Diff Diff

Regressions in System.Diagnostics.Perf_Activity

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
ActivityAllocations - Duration of single invocation 758.56 ns 813.42 ns 1.07 0.20 False
ActivityAllocations - Duration of single invocation 623.66 ns 715.80 ns 1.15 0.11 False

graph
graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Diagnostics.Perf_Activity*'

Payloads

Baseline
Compare

Histogram

System.Diagnostics.Perf_Activity.ActivityAllocations(idFormat: W3C)


Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 813.4234915083607 > 802.4856996133994.
IsChangePoint: Marked as a change because one of 4/13/2022 10:02:29 PM, 4/19/2022 4:53:57 AM falls between 4/10/2022 3:30:35 PM and 4/19/2022 4:53:57 AM.
IsRegressionStdDev: Marked as regression because -6.738201722219294 (T) = (0 -836.9530328425158) / Math.Sqrt((284.7705405371702 / (23)) + (1754.8085293566023 / (18))) is less than -2.0226909200346674 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (23) + (18) - 2, .025) and -0.09216640816227123 = (766.323727398658 - 836.9530328425158) / 766.323727398658 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

```#### System.Diagnostics.Perf_Activity.ActivityAllocations(idFormat: Hierarchical)

```log

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 715.7983124923339 > 655.6405660025653.
IsChangePoint: Marked as a change because one of 4/13/2022 4:27:35 PM, 4/19/2022 4:53:57 AM falls between 4/10/2022 3:30:35 PM and 4/19/2022 4:53:57 AM.
IsRegressionStdDev: Marked as regression because -11.823299540885158 (T) = (0 -708.1518629941236) / Math.Sqrt((243.4064317595605 / (20)) + (499.0656483304033 / (18))) is less than -2.028094000977961 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (20) + (18) - 2, .025) and -0.11789003363534732 = (633.4718457872224 - 708.1518629941236) / 633.4718457872224 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged, refs/heads/main, ubuntu 18.04, RunKind=micro, Regression, CoreClr, Look Again, arm64

Milestone: -

@AndyAyersMS
Copy link
Member

cc @TIHan

@TIHan TIHan mentioned this issue May 5, 2022
2 tasks
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label May 5, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone May 5, 2022
@TIHan
Copy link
Contributor

TIHan commented Jun 6, 2022

@kunalspathak and I looked at this a bit ago. We believe that the regression wasn't caused by the mod operation.

@TIHan TIHan removed their assignment Jun 6, 2022
@kunalspathak
Copy link
Member

So I think there are 2 separate regressions going on...

For https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu%2018.04/System.Diagnostics.Perf_Activity.ActivityAllocations(idFormat%3a%20W3C).html, the regression is kind of stay and the commit range is 5c57f2c...7bac4e8.

For https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu%2018.04/System.Diagnostics.Perf_Activity.ActivityAllocations(idFormat%3a%20Hierarchical).html

image

image

The commit range where this benchmark is affected is 3635e0f...f7d9d6d.

Just to be sure, you might want to see if it repros and if yes, we can do asmdiff to see what is causing it.

@kunalspathak
Copy link
Member

@tarekgh thinks #67920 is not the likely suspect.

@tarekgh - @TIHan spent time and did investigation if #66407 is the cause, and he was not able to repro the regression. In fact, he see some improvements.

I see that #67920 added a layer of Enumerator() and not sure if that is causing it, but the regression is consistent on windows/ubuntu. Did you verify by locally running the benchmarks if it was not with the Activity related change?

Windows: https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_arm64_Windows%2010.0.19041%2fSystem.Diagnostics.Perf_Activity.ActivityAllocations(idFormat%3a%20W3C).html

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_arm64_Windows%2010.0.19041%2fSystem.Diagnostics.Perf_Activity.ActivityAllocations(idFormat%3a%20Hierarchical).html

There was also #67938 (which I doubt might be related).

@tarekgh
Copy link
Member

tarekgh commented Jun 8, 2022

I see that #67920 added a layer of Enumerator() and not sure if that is causing it, but the regression is consistent on windows/ubuntu. Did you verify by locally running the benchmarks if it was not with the Activity related change?

#67920 is a code is not exercised at all in the perf test scenario, so I am positive this shouldn't be related to the regression we are seeing.

There was also #67938 (which I doubt might be related).

This could be related; the only part that can affect this test is the change:

image

CurrentChanged usually is null, so the overhead added there is only the assignment and the null check for CurrentChanged. I am not sure if this is enough change can cause such regression but I agree it is related. I'll try to measure that.

@kunalspathak
Copy link
Member

Thanks! I will assign it to you then.

@TIHan
Copy link
Contributor

TIHan commented Jun 8, 2022

@tarekgh I think we found the cause, basically this case of x % 2 uses the csneg operation where before it had a more optimal form than that; this is just specifically when the divisor is a constant 2.

@tarekgh
Copy link
Member

tarekgh commented Jun 8, 2022

I'll assign the issue back to @TIHan then. Thanks for finding the root cause.

@tarekgh tarekgh assigned TIHan and unassigned tarekgh Jun 8, 2022
@TIHan
Copy link
Contributor

TIHan commented Jun 9, 2022

Will resolve this regression in #68885

@TIHan TIHan mentioned this issue Jun 10, 2022
1 task
@TIHan
Copy link
Contributor

TIHan commented Jun 16, 2022

@tarekgh Apologies for circling around. I spent some time and did a much deeper investigation into this. It turns out it is not caused by the x % 2 changes we've made. In fact, any mod changes wouldn't impact this benchmark because the expression, x % 2 == 0, does not benefit from the recent mod optimizations.

I'm assigning back to you.

@TIHan TIHan assigned tarekgh and unassigned TIHan Jun 16, 2022
@tarekgh
Copy link
Member

tarekgh commented Jun 17, 2022

I have done some measurements locally on my machine to see the cost of the change #67938, I got the following:

With #67938 change

[2022/06/16 18:19:17][INFO] |              Method |     idFormat |     Mean |   Error |  StdDev |   Median |      Min |      Max |  Gen 0 | Allocated |
[2022/06/16 18:19:17][INFO] |-------------------- |------------- |---------:|--------:|--------:|---------:|---------:|---------:|-------:|----------:|
[2022/06/16 18:19:17][INFO] | ActivityAllocations | Hierarchical | 157.8 ns | 6.46 ns | 7.44 ns | 156.6 ns | 147.6 ns | 175.6 ns | 0.0420 |     352 B |
[2022/06/16 18:19:17][INFO] | ActivityAllocations |          W3C | 229.9 ns | 8.18 ns | 9.42 ns | 234.5 ns | 214.1 ns | 239.6 ns | 0.0489 |     416 B |

Without #67938 change

[2022/06/16 18:22:44][INFO] |              Method |     idFormat |     Mean |   Error |   StdDev |   Median |      Min |      Max |  Gen 0 | Allocated |
[2022/06/16 18:22:44][INFO] |-------------------- |------------- |---------:|--------:|---------:|---------:|---------:|---------:|-------:|----------:|
[2022/06/16 18:22:44][INFO] | ActivityAllocations | Hierarchical | 146.0 ns | 4.41 ns |  4.90 ns | 147.7 ns | 136.5 ns | 154.4 ns | 0.0419 |     352 B |
[2022/06/16 18:22:44][INFO] | ActivityAllocations |          W3C | 224.3 ns | 9.19 ns | 10.58 ns | 221.9 ns | 211.3 ns | 242.0 ns | 0.0492 |     416 B |

I am seeing the Hierarchical scenario mentioned in the issue regressed by 15% while in my measurement is shows around 8%. Also, W3C scenario in this issue showed it is 7% regression while my measurements show it is around 2%.

By that, my change could be contributing here but I don't think it is the only culprit in this regression. Also, considering we need the features we have added here, it is acceptable to have this regression especially the test is not really testing the real scenario but only the part that initializes the empty Activity object. So, this regression wouldn't be noticeable in the real scenario of using Activity class.

I'll leave it to anyone else who wants to dig more to see if there is anything contributing to this, or if we are ok to resolve this issue.

@tarekgh tarekgh removed their assignment Jun 17, 2022
@tarekgh
Copy link
Member

tarekgh commented Jun 17, 2022

One last thing I just notices, I am seeing this issue marked with Ubuntu 18.04. Although I did the measurement on my machine which has Windows OS but if my change is the cause, it should show on all OS's and all architectures as the change is not OS or architecture dependent.

@kunalspathak
Copy link
Member

this issue marked with Ubuntu 18.04.

And this is on arm64, did you check on windows/arm64?

Also, considering we need the features we have added here, it is acceptable to have this regression

Sounds good, and I am sure you verified that we didn't introduce any regression in normal path.

especially the test is not really testing the real scenario but only the part that initializes the empty Activity object.

Yes, we have seen that in past in other benchmarks and fix it by adding relevant test in dotnet/performance. e.g. dotnet/performance#2479. If you don't mind fixing the benchmark, that will be great.

@tarekgh
Copy link
Member

tarekgh commented Jun 17, 2022

And this is on arm64, did you check on windows/arm64?

No, I didn't try arm. But I am wondering now, wouldn't this be more related to JIT then? I cannot think of anything in our changes that can cause such regression in specific architecture.

Yes, we have seen that in past in other benchmarks and fix it by adding relevant test in dotnet/performance. e.g. dotnet/performance#2479. If you don't mind fixing the benchmark, that will be great.

I would keep the test as it is for now as it looks sensitive to the JIT changes. So, I am seeing the value to keep it. We can consider adding more tests later for real usage scenarios with Activity.

@JulieLeeMSFT
Copy link
Member

@TIHan PTAL.

@TIHan
Copy link
Contributor

TIHan commented Jul 5, 2022

@JulieLeeMSFT This regression is occurring in x64 as well, so it's not related to changes in the JIT for ARM64. The regression was also said to be acceptable:

Also, considering we need the features we have added here, it is acceptable to have this regression

Closing. Anyone is free to re-open this if they deem it critical.

@TIHan TIHan closed this as completed Jul 5, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 5, 2022
@jeffhandley jeffhandley added arch-arm64 runtime-coreclr specific to the CoreCLR runtime os-linux Linux OS (any supported distro) and removed arm64 labels Dec 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

No branches or pull requests

6 participants