-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] .NET6 Performance Regression #5385
Comments
Reproduction for akkadotnet#5385
Reproduction for akkadotnet#5385
@Aaronontheweb I would suggest getting some more heruistics on the threads, i.e. set up multisampling (I mean, reading number of TP threads should be cheap overall) and figure out averageThreads, MaxThreads and threads over 95% of time. I can hack up a snippet to help if you'd like. If I had to guess overall, I'd wager it's an issue with the changes in Threadpool hill-climbing algorithm in core, MAYBE some interplay with ThreadLocal if we are creating and destroying threads more frequently (not sure TBH, would need to dig.) It may or may not be worth it to try running this with the actors running under DTP wherever possible under both environments; perf may be lower in both cases but that would at least help to isolate whether it -is- the .NET threadpool itself or something else creeping in. |
That would be very helpful!
That's a good idea. This should be easy to configure. @Zetanova noticed anything like this on .NET 6 with the |
Here you go: https://gist.github.com/to11mtm/1c3f5137a207d59d5f3e61bb198aeeae . Note I haven't exactly -tested- this with the way I'm abusing CTS, and it's not -quite- thread safe (i.e. you'd need to make changes to be able to observe values while it is monitoring.) The only thing you might want to tweak is adding a method to pre-allocate the array and minimize the chance of a GC Happening:
There's plenty of other opportunities for cleanup/niceness there, but it is a good start if you want to get fancier (i.e. track when timer fires noticeably off interval to maintain rate, which is an indication of complete system overpressure since we are on a dedicated thread in this loop.) |
Ah thanks, so do I just launch that in the background along with the benchmark to have it gather samples concurrently in its own thread? |
Sorry, dont have exp. with .net6 yet, For sequential workload the thread number are to high. Akka.Remote: Akka.Actor: This makes a theoretical max total for your system of 11-13 threads for each node. If more threads/tasks then the theoretical max total are scheduled That's way I made the https://github.com/Zetanova/Akka.Experimental.ChannelTaskScheduler The actor system workload has an unique task scheduling property to be able to indefinitely delay the execution of ActorCells. To use
@Aaronontheweb
|
I'll give that a shot @Zetanova and report back on here. |
Filed a CoreCLR issue here: dotnet/runtime#62967 |
This is my finding from trying to run
It looks like the spec failure is actually inherent in the HashedWheelTimerScheduler and only appears when it is resource starved and net6.0 new thread pool is more resource hungry compared to the previous thread pool implementation. |
Related PR: #5441 |
I could fix the test itself with priming it, see: |
@Aaronontheweb is there any update on this? We've been holding off upgrading to .NET Core 6 due to this issue but as it's coming to EOL we are starting to run out of time a bit. |
@KieranBond in all of our large scale testing we've been doing on https://github.com/petabridge/AkkaDotNet.LargeNetworkTests I haven't seen any issues even in a 200+ node cluster, but we're running using the I'm running a comparison today against .NET Core App 3.1, now that akkadotnet/Akka.Management#563 is fixed - that's what stopped me from working on this last night on my Twitch stream, per https://twitter.com/Aaronontheweb/status/1522024859571822595 |
@KieranBond just completed my experiment today and while I am somewhat baffled by the results. Going to repost my experiment notes here. This is all using the .NET Core 3.1We decided to re-run some of our experiments on .NET Core 3.1, to account for differences in the .NET Experiment 9: No DistributedPubSub, Akka.Persistence Sharding, 200 Nodes, .NET Core 3.1The cluster remained stable at 200 nodes: Processing roughly 5000 msg/s. Total thread-count was elevated compared to .NET Core 3.1 with roughly 71 reported threads per process, rather than 66 as observed in .NET 6: And also interestingly, CPU utilization and memory usage were both significantly lower in .NET Core 3.1 Memory usage is normally around 80% in .NET 6 and CPU utilization around 14.7%. The memory usage is the most shocking difference here - a difference of about 20GB worth of usage across the cluster. I'm going to retest with |
Might have been premature on my comments here regarding .NET Core 3.1 / .NET 6 - have a bunch more data that I'm going to publish from running several more iterations of this experiment on the same AKS cluster. |
cc @kouvel as an FYI |
Some benchmarks this morning for throughput - one running with normal Akka.NET v1.4 defaults (dedicated thread pool) All benchmarks run using https://github.com/akkadotnet/akka.net/tree/dev/src/benchmark/RemotePingPong Akka.NET V1.4 Defaults (Dedicated Thread Pool).NET 6OSVersion: Microsoft Windows NT 10.0.19044.0 Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads .NET Core 3.1OSVersion: Microsoft Windows NT 6.2.9200.0 Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads In both cases when we hit the client count=20,25 iteration the thread count drops quite a bit (I'm assuming due to hill-climbing) - but in the case of .NET Core 3.1 there's really no performance loss. Whereas with .NET 6 throughput drops from ~300k msg/s to 200k msg/s and stays there for a period of time. We've been able to reproduce this regularly. Worth noting I'm running all of these benchmarks on a Gen 1 Ryzen machine, in case that makes any difference. Akka.NET V1.5 Proposed Defaults (System.Threading.Channels over .NET
|
Thanks for getting back so quick and investigating some more @Aaronontheweb. |
@KieranBond so here's the good news, I think - the perf loss stemming from .NET 6 happens when the .NET
I don't think you should have any major performance issues running .NET 6 in a long-lived application. I ran .NET 6 for hours inside a not-extremely-busy, but continuously busy large Akka.NET cluster without seeing any changes in the thread count in either direction. On a 16 vCPU machine, the thread count stayed at around 66 once the cluster was bigger than ~40 nodes and then stayed that way all throughout the entire deployment, eventually 200 nodes. |
Another question - These experiments you're performing, obviously are testing a very high throughput but do you think they reflect as well in less busy systems? i.e is this performance worth worrying about if your system is not maximizing Akka throughput and hitting around the 30k msg/s mark? |
Think you've just answered my question as I asked it! |
@KieranBond glad to help! While I think there is a real issue with the .NET ThreadPool here, I'm going to close this issue because:
|
Adding some additional benchmarks to Akka.NET to attempt to measure this. The most interesting reading is here so far: #6127 (comment) |
New theory regarding this is that it might actually be GC changes introduced in .NET6 that is the root cause of the issue here rather than the thread pool. Started some experiments with server-mode GC disabled to see if I can reproduce the perf drop - I can and it's even more noticeable between .NET Core 3.1 and .NET 6. Uses the latest v1.5 bits. .NET Core 3.1OSVersion: Microsoft Windows NT 6.2.9200.0 Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads .NET 6OSVersion: Microsoft Windows NT 10.0.19044.0 Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads |
Changed private static async Task Start(uint timesToRun)
{
for (var i = 0; i < timesToRun; i++)
{
var redCount = 0;
var bestThroughput = 0L;
foreach (var throughput in GetClientSettings())
{
GC.Collect(); // before we start
var result1 = await Benchmark(throughput, repeat, bestThroughput, redCount);
bestThroughput = result1.Item2;
redCount = result1.Item3;
GC.Collect(); // after we finish for good measure
}
}
Console.ForegroundColor = ConsoleColor.Gray;
Console.WriteLine("Done..");
} And I kept server and concurrent GC both disabled across both sets of benchmarks. <PropertyGroup>
<ServerGarbageCollection>false</ServerGarbageCollection>
<ConcurrentGarbageCollection>false</ConcurrentGarbageCollection>
</PropertyGroup> .NET Core 3.1OSVersion: Microsoft Windows NT 6.2.9200.0 Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads .NET 6OSVersion: Microsoft Windows NT 10.0.19044.0 Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads Still see a perf drop around clients=20, but it's not nearly as pronounced. |
@Aaronontheweb One of the reasons I wanted to remove the unique 'useless' instance of |
@Zetanova oh yeah, can definitely submit for v1.5 - the old DI is no longer supported and already removed. |
Going to reopen this because actual end-users have been reporting issues here over the past month |
Any improvement in .NET7, now that it's out? |
@ismaelhamed I actually have an update on this for .NET 6! I'll publish after my morning meeting. |
So I believe this issue was identified and resolved in May of this year (2022): dotnet/runtime#68881 - this fix was released in .NET runtime 6.0.6 The original bug in .NET 6 basically caused thread pause / After working with the CoreCLR team and producing some detailed GC / ThreadPool / CPU sampling metrics in PerfView, that issue was outed as the likely cause of the reproducible .NET 6 performance drop. I upgraded my local environment to the latest versions of .NET 6 SDK & Runtime, the performance regression is gone: FixUpgrade your .NET 6 runtimes to at least 6.0.6 - preferably go all the way to the latest version (6.0.11) |
Version Information
Version of Akka.NET? v1.4.28
Which Akka.NET Modules? Akka, Akka.Remote
Describe the performance issue
Our
RemotePingPong
benchmark has been the standard used for 7 years or so to measure throughput passing over a single Akka.Remote connection between twoActorSystem
instances. It's a crucial benchmark because it measures the biggest bottleneck in Akka.Remote networks: the end to end response time over a single connection.Over the lifespan of .NET Core since 2017 we've seen steady improvements in the benchmark numbers each time a new version of the .NET runtime is released usually as an improvement of underlying threading / concurrency / IO primitives introduced into the runtime itself.
With the release of .NET 6, however, we've noticed that while the overall throughput in some measures remains higher than on .NET 5 for these same reasons - there are steady, reproducible, long-lasting drops in total throughput that occur only on .NET 6.
Data and Specs
Here are the RemotePingPong numbers from my local development machine, a Gen 1 8-core Ryzen, on .NET Core 3.1:
Edit: update the .NET Core 3.1 benchmark numbers to include the settings from #5386
And here are the equivalent numbers for this same benchmark on .NET 6:
I've been able to reproduce this consistently - a sustained drop in throughput that lasts for roughly 30s. We've also noticed this in the Akka.NET test suite since merging in #5373 - the number of failures in the test suite has grown and has started to include tests that historically have not been racy. We've also observed this separately in the Phobos repository which we also upgraded to use the .NET 6 SDK.
There is definitely something amiss here with how Akka.NET runs on top of .NET 6.
Expected behavior
A consistent level of performance across all benchmarks.
Actual behavior
Intermittent lag, declines in throughput, and unexplained novel race conditions.
Environment
.NET 6, Windows
Additional context
There is some speculation from other members of the Akka.NET team that the issue could be related to some of the .NET ThreadPool and thread injection changes made in .NET 6:
The text was updated successfully, but these errors were encountered: