-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception throw/catch impactfully slower when debugger attached #47617
Comments
Tagging subscribers to this area: @tommcdon Issue DetailsRepro: using System;
using System.Diagnostics;
while (true)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
try { throw new Exception(); } catch { }
}
Console.WriteLine(sw.Elapsed);
} When I run that with ctrl-F5, I get output like:
When I run that with F5, I get output like:
That's a 4500x slowdown, with every exception throw/catch consuming ~20ms. While ideally well-behaved apps wouldn't throw lots of exceptions, this has shown to be a significant cause of slowdown for real apps while being developed, with a noticeable impact on developer inner loop performance especially if the exceptions occur at app startup. cc: @noahfalk, @mikem8361, @gregg-miskelly
|
Work has been done in the past to reduce the overhead of exceptions while running under the debugger in the past, for example: https://github.com/dotnet/diagnostics/blob/0d78fe5fa58c88d4524f7698f9a1abbeba31f33a/src/inc/cordebug.idl#L3291. Just out of curiosity, is the impact measurably smaller if the exception is thrown/caught in non-user code and with Just My Code enabled? |
Visual Studio logs the exception to output window.
I can say yes. |
On my machine, with JMC on:
|
Two thoughts --
|
Sure... doing what? The picture doesn't change all that much, for example, if I have a bunch of thread pool threads just looping over sleeps: using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
for (int i = 0; i < Environment.ProcessorCount; i++)
{
Task.Run(() =>
{
while (true) Thread.Sleep(1);
});
}
while (true)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
try { throw new Exception(); } catch { }
}
Console.WriteLine(sw.Elapsed);
} but if make those threads sit in tight loops: using System;
using System.Diagnostics;
using System.Threading.Tasks;
for (int i = 0; i < Environment.ProcessorCount; i++)
{
Task.Run(() =>
{
while (true) { }
});
}
while (true)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
try { throw new Exception(); } catch { }
}
Console.WriteLine(sw.Elapsed);
} the gap actually grows, going from ~0.0007s to ~7.3s, so a 10,000x slowdown.
Part of the problem I've seen in some apps is that the number of exceptions is significantly amplified because of deeper "call stacks", and more specifically, when async is involved, because the exception ends up getting caught and thrown again at every level of the chain. For example, if I change the app to be: using System;
using System.Diagnostics;
using System.Threading.Tasks;
while (true)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
try { await Run(10); } catch { }
}
Console.WriteLine(sw.Elapsed);
}
static async Task Run(int depth)
{
if (depth > 0)
{
await Run(depth - 1);
}
throw new Exception();
} now ctrl-F5 shows numbers like ~0.01s and F5 shows numbers like ~28.7s. But even with deeper non-async call stacks, the gap is still huge. I changed it to this: using System;
using System.Diagnostics;
using System.Threading.Tasks;
while (true)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
try { Run(10); } catch { }
}
Console.WriteLine(sw.Elapsed);
}
static void Run(int depth)
{
if (depth > 0)
{
Run(depth - 1);
}
throw new Exception();
} and now ctrl-F5 shows numbers like ~0.001s and F5 shows numbers like ~2.05s, so down to an ~2000x difference. The gap does drop further the deeper the frames go, though. At 100 frames, the difference drops to ~400x. |
RE: Background threads - from GC investigations and/or telemetry do you have a sense for what suspend times for 'typical' apps look like? If so, I might try to mimic that time in whatever app we will use as we start to dig into the numbers.
To make sure I understand, you are saying that it takes significantly more time as call stack size increases (~2.05s vs. ~1.9 in your last example vs first example), but the multiplier actually falls (~2000x vs ~4500x) since the Ctrl-F5 time goes up as well. Correct? |
Yup |
I don't have a repro that has as impressive a penalty as 4000x but I do have a real-world case. It's a champion scenario for the Azure EventHubs Processor SDK that has multi-threading and network calls. In total the startup process throws ~180 exceptions (which are in fact 32 distinct exceptions but multiplied by layers of async state machines re-throwing) and with JMC off the startup time is: Without debugger ~7.5sec What was surprising to me is how large the impact of the debugging is compared to actual networking. The app does 112 blob requests during startup + various TCP connections to EventHubs service so throwing an exception in an async method costs as much as making multiple real service calls over the network. I'm happy to provide the app with all connection strings ready or record a profile. |
I spent a little time with a profiler to assess where we are spending our time. I profiled Steve's example that uses depth 10 callstacks without async:
On my particular machine (Windows, .Net Core 3.1.11, VS 16.8.4) using Ctrl-F5 is ~1.5ms per iteration of the outer while loop and F5 is ~1.5s, so about a 1000x overhead for me. During the 15ms handling of each exception there are 3 events dispatched from the runtime to the debugger:
This shows the breakdown of the ~15ms window for one exception Each event logically traverses across 3 components:
My take is there are two different approaches we could follow at this point:
Personally my preference is that if we are going to spend time on this we should look into option (2), the major changes. I worry that we don't have much runway left to optimize incrementally given how easy it is for apps to start throwing yet more exceptions. I also expect building in-proc extensibility points opens the door to a bunch of other useful diagnostic scenarios. For example in the debugger space it could contribute towards other high performance events like conditional breakpoints, or being able to do more sophisticated analysis of rethrown exceptions for async. Outside debugging another example is having a dump collector that can capture dumps on 1st chance exceptions. Priority-wise the runtime diagnostics team already has months of committed feature work for .NET 6 and a hearty backlog of bugs. I'd guess this would land in the grey area where we might be able to get it within the release timeframe but it is far enough down that we shouldn't be making any promises at this point. Thoughts on all of this? In particular @gregg-miskelly I'd be curious to hear what you think feasibility/interest is for doing a meatier change here? |
I would agree with you that there is appeal in approach 2. On the other hand, that makes the change way way more expensive. So I think it is worth at least thinking about how much we could improve without going that far. Here are the inefficiencies that I see in the current system --
If you don't see any reason I am missing that we need these things -- if we took out the stack walks entirely, and reduced the number of stopping events from three to two, can you guess from the trace about what 'x' we are on? Is the main thing left synchronizing to send stopping events? |
Just to confirm you are saying it would be OK if we defined a new exception callback that didn't provide the ICorDebugFrame argument or we kept the current callback and return null for that parameter? If you use that parameter for anything we probably still have to do a stackwalk in DBI.
It is quite a bit more than I was anticipating we'd able to cut. I'm a little surprised we hadn't already cut those in the past : ) Rough guess maybe a 70% savings if this all held up? If we still need the stackwalk but we can drop the first chance notification that might be 50% savings. One nice part is that an API that disables debugger exception events is part of what we'd need to build to make the in-proc filtering case useful anyways. |
I'll take it 😄 |
First, let me try to be precise for what the VS debugger would need:
Assuming that that basic level of decoding can be done without a stack walk: adding a new callback interface is fine with me. |
Thanks Gregg! Its possible to get the frame information you need without a stackwalk, but it would take a good bit more refactoring. Effectively we'd be building a new set of parallel exception event handlers that captures each piece of information as the runtime does exception dispatch and then routes it all the way up to a new public ICorDebug API. I'm going to propose that if we are trying for the cheap and simple we should just add an API that lets you turn off the first chance exception handler and then assess if it is worth trying to optimize away the stackwalks as round 2. |
Just to be clear, we need two items for 'round 1':
I have opened VS Task#1274770 to track the VS/vsdbg side of consuming this. |
Are you saying that despite the presence of the NonUserCode attribute in the assembly that VS would not have previously called ICorDebugFunction2::SetJmcStatus to mark it as non-user code? If VS had previously marked it as non-user then the runtime should never generate a USER_FIRST_CHANCE event in that frame. And thanks for mentioning it again, somehow I missed the multiple USER_FIRST_CHANCE requirement and felt silly when I re-read what you wrote above : ) |
Correct. VS doesn't attempt to scan through a module during module load and pre-emptively call SetJmcStatus. We do it as we encounter those methods. |
Reconsider for .NET 9 per #92856 |
Repro:
When I run that with ctrl-F5, I get output like:
When I run that with F5, I get output like:
That's a 4500x slowdown, with every exception throw/catch consuming ~20ms. And this does scale with exception count: if I increase the repro to 1000 exceptions, without the debugger it takes ~0.004s and with the debugger it takes ~18s.
While ideally well-behaved apps wouldn't throw lots of exceptions, this has shown to be a significant cause of slowdown for real apps while being developed, with a noticeable impact on developer inner loop performance especially if the exceptions occur at app startup.
cc: @noahfalk, @mikem8361, @gregg-miskelly
The text was updated successfully, but these errors were encountered: