-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] GC pause time is too large for 60 fps #65850
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @dotnet/gc Issue DetailsWe are investigating GC pause time in order to potentially achieve 60 fps, but we observe long GC EE suspend pauses. For example, on next benchmark max GC pause on arm64(rpi4) can get as high as 55 ms with default GC setup (i.e. default latency mode, default max gen0/gen1 size). This is a huge pause, because 1 frame in 60 fps mode should fit in ~16 ms. Some details:
Additionally, we've found the next measurements on coreclr/mono and results are pretty much the same: https://gist.github.com/jechter/2730225240163a806fcc15c44c5ac2d6. Are there any plans to support GC mode with limited pause time? using System;
using System.Threading;
using System.Collections.Generic;
using System.Diagnostics.Tracing;
namespace gctest2
{
class Test
{
static void Main()
{
GCEventListener.Start();
Test t = new Test();
t.mGCTest();
}
struct Log
{
public String log;
}
private static char[] base_string = "GC TEST TEST TEST TEST TEST TEST TEST TEST TEST TEST TEST TEST COUNT".ToCharArray();
public void mGCTest()
{
int count = 0;
int max = 102400;
List<Log> list = new List<Log>();
for (int index1 = 0; index1 < 2000; ++index1)
{
for (int index2 = 0; index2 < 1000; ++index2)
{
String sd = $"GC TEST TEST TEST TEST TEST TEST TEST TEST TEST TEST TEST TEST COUNT {count} / {max}";
Log log = new Log();
log.log = sd;
list.Add(log);
if (list.Count == max)
{
while (list.Count > 0)
{
list.RemoveAt(0);
}
}
count++;
}
}
}
}
public sealed class GCEventListener : EventListener
{
private static GCEventListener instance = null;
private long timeGCStart = 0;
private bool verbose = false;
private GCEventListener()
{
Console.WriteLine("GCEventListener Created");
}
public static void Start(bool verbose = false)
{
if(instance == null)
instance = new GCEventListener();
}
// Called whenever an EventSource is created.
protected override void OnEventSourceCreated(EventSource eventSource)
{
// Watch for the .NET runtime EventSource and enable all of its events.
if (eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime"))
{
EnableEvents(eventSource, EventLevel.Informational, (EventKeywords)0x1);
}
}
private void PrintEventData(EventWrittenEventArgs eventData)
{
Console.WriteLine($"ThreadID = {eventData.OSThreadId} ID = {eventData.EventId} Name = {eventData.EventName} Tick = {eventData.TimeStamp.Ticks/10.0/1000.0}");
for (var i = 0; i < eventData.Payload.Count; i++)
{
string payloadString = eventData.Payload[i] != null ? eventData.Payload[i].ToString() : string.Empty;
Console.WriteLine($"\tName = \"{eventData.PayloadNames[i]}\" Value = \"{payloadString}\"");
}
Console.WriteLine("\n");
}
// Called whenever an event is written.
protected override void OnEventWritten(EventWrittenEventArgs eventData)
{
PrintEventData(eventData);
if (eventData.EventName.Contains("Concurrent"))
{
PrintEventData(eventData);
}
else if (eventData.EventName.Contains("GCStart"))
{
timeGCStart = eventData.TimeStamp.Ticks;
//PrintEventData(eventData);
}
else if (eventData.EventName.Contains("GCEnd"))
{
long timeGCEnd = eventData.TimeStamp.Ticks;
long gcIndex = long.Parse(eventData.Payload[0].ToString());
Console.WriteLine("GC#{0} took {1:f3}ms for generation {2}", gcIndex, (double) (timeGCEnd - timeGCStart)/10.0/1000.0, eventData.Payload[1]);
//PrintEventData(eventData);
}
}
}
}
|
The current CoreCLR GC does not have guaranteed upper bound on the pause times. The pause times can the significant if the application creates patterns unfriendly to generational GC like long linked list or a lot of Gen2 -> Gen0 references. Have you analyzed where the time is spent in the GC for this microbenchmark? Looking at the microbenchmark code, I think large number of Gen2 -> Gen0 references may be contributing be the significant pause times.
I am not aware of plans to build a GC mode with max pause time that is guaranteed even when the application uses patterns that are unfriendly to GC. It would likely need to be a very different GC from what we have today. I think we would be happy to accept a GC implementation like that into dotnet/runtime if somebody builds it. |
weird, I'd expect smaller gen0 to lead to more frequent GCs but smaller pauses overall (disclaimer: not a gc expert) |
That's true only for programs that behave according to the simplifying assumptions that the generational GC designs are based on. The example above violates these simplifying assumptions. It ends up with a large Gen2 array that has to be scanned by Gen0 GC every time since it may contain Gen0 back references. Here is a simpler example that amplifies the problem. It has Gen0 GC pause times around 0.3 seconds on my desktop machine. Changing the Gen0 budged does not have affect on the GC pause times.
BTW: It is not unusual to see these kind of anti-patterns in "high-performance" code that tries to cache and reuse data structures to "save" the GC work. |
before we provide soft real time guarantee (which is a lot of work as you can imagine), I was actually thinking of provide a mode for our GC with no generations just for workloads with no generational behavior at all. so every GC is a full GC. I hacked up a version to do this tonight 😄 with @jkotas's code I get this the pause time in single gen is actually very tiny, but there's one BGC that has that max pause which made the avg pause 8.1ms instead of < 1ms - what I have right now is very hacky. but if anyone is interested, I can see if I could get some time in the not too distant future make it reasonable quality to become an experimental feature (ie, can be turned on with a config). |
GC pause time should be related to the CPU power. On PC processor, getting 600fps with SubstainedLowLatency is achievable. |
Thanks for all the answers! Benchmark that I was testing is a sample GC stress test benchmark and I agree that it may not represent optimized behavior in terms of GC, but we wanted to see how GC behaves in this mode too.
I've only tested different values of
This is true that average GC pause reduces significantly (to ~1-2ms with
@Maoni0 This looks very interesting, can you share more details on how to make this hack myself or maybe even patch, so that I can experiment with it? |
@gbalykov I need to do a bit more work on it - if I can get some time to work on this next week I'll give you a commit to try. |
@gbalykov also for the occasional long pause you are seeing right now, I would suggest 2 things - does it show that the Promote MB is also high for those GCS? if so could you please try the generation aware analysis that we introduced in .NET 5? it would help us figure out what's causing the more promoted bytes. if the Promoted is the same yet the GC time is longer, you could also capture a trace with CPU samples with perfview if you are on windows. I have a commandline in mem-doc (search for |
On the benchmark that I've shared above |
Also going to link #48937 here, where we've experienced Gen1 collection times greater than the 16ms window of 60fps, which is a bare minimum for us. In my testing there, we didn't see what I would consider a "large amount" of promotion: https://user-images.githubusercontent.com/1329837/118099109-c0f93a00-b40f-11eb-80ae-db3ccf4a39a0.png as what's shown in this issue. I've forgotten how many times I've read through @Maoni0 's GC doc all to come to the conclusion that we're not doing anything out of the ordinary.
I would also be very interested and am willing to test this even off the |
+1 The lack of a low-latency GC option makes C#/F# sub-optimal for low-latency projects, eg high-frequency-trading. |
@Maoni0 did you have a chance to work on GC with no generations? |
I did, but only a little bit :( I got it to kind of run but clearly there were bugs. I'm hoping to get more time to work on this now that we finished .net 7. this seems like a good project for the holidays... |
+1 I also need a consistent 20ms processing and was doing a research on if it is achievable with c# or not. |
@Maoni0 hi, did you have a chance to continue your work on GC with no generations? |
Adding another voice, I am currently developing a latency sensitive application and I'm also interested in the above discussion of a generationless GC as a potential option. |
Not experienced with the inner workings of GCs and can't talk to the complexities to such an approach, but I would like a way to have more control over the GC without requiring hard guarantees. I am building a UI platform (not a game engine where a dropped frame is the end of the world) and would like to have ways to potentially help avoid large GC pauses if I can instead perform smaller ones more often. For example, a call to I am aware that this is a difficult problem to solve and, in many ways, goes against the point of the GC to try make it do things it doesn't want to do "optimally" from the GCs perspective. However, I think providing more primitives, even if it's easy to misuse them, that allow a developer in a high-performance scenario to dictate information to the garbage collector in hopes of avoiding a large GC pause would be appreciated. |
Hi, @Maoni0, did you have a chance to continue your work on generationless GC? |
We are investigating GC pause time in order to potentially achieve 60 fps, but we observe long GC EE suspend pauses.
For example, on next benchmark max GC pause on arm64(rpi4) can get as high as 55 ms with default GC setup (i.e. default latency mode, default max gen0/gen1 size). This is a huge pause, because 1 frame in 60 fps mode should fit in ~16 ms.
Some details:
GCRestartEEEnd
andGCSuspendEEBegin
eventsSustainedLowLatency
mode doesn't improve max GC pause timeLowLatency
mode reduces max GC pause time, but max RSS increases signigicantly (on benchmark above ~10x increase in max RSS)COMPlus_GCGen0MaxBudget
andCOMPlus_GCGen1MaxBudget
do not improve max GC pause timeAdditionally, we've found the next measurements on coreclr/mono and results are pretty much the same: https://gist.github.com/jechter/2730225240163a806fcc15c44c5ac2d6.
Are there any plans to support GC mode with limited pause time?
cc @alpencolt @jkotas
The text was updated successfully, but these errors were encountered: