Reduce allocations when async methods yield #13105

stephentoub · 2017-07-28T23:13:27Z

The first time a Task-based async method yields, today there are four allocations:

The Task returned from the method
The state machine object boxed to the heap
An Action delegate that'll be passed to awaiters
A MoveNextRunner that stores state machine and the ExecutionContext, and has the method that the Action actually references

For a simple async method, e.g.

static async Task DoWorkAsync()
{
    await Task.Yield();
}

when it yields the first time, we allocate four objects equaling 232 bytes (64-bit).

This PR changes the scheme to use fewer allocations and less memory. With the new version, there are only two allocations:

A type derived from Task
An Action delegate that'll be passed to awaiters

This doesn't obviate the need for the state machine, but rather than boxing the object normally, we simply store the state machine onto the Task-derived type, which itself implements IAsyncStateMachine. Further, the captured ExecutionContext is stored onto that same object, rather than requiring a separate MoveNextRunner to be allocated, and the delegate can point to that Task-derived type. With this new scheme and that same example from earlier, rather than costing 4 allocations and 232 bytes, it costs 2 allocations and 176 bytes, so 50% fewer allocations and 25% less allocated memory.

It also helps further in another common case. Previously the Task and state machine object would only be allocated once, but the Action and MoveNextRunner would be allocated and then could only be reused for subsequent awaits if the current ExecutionContext was the default. If, however, the current ExecutionContext was not the default, every await would end up allocating another Action and MoveNextRunner, for 2 allocations and 56 bytes on each await. With the new design, those are eliminated, such that even if a non-default ExecutionContext is in play, and even if it changes in between awaits, the original allocations are still used.

There's also a small debugging benefit to this change: the resulting Task object now also contains the state machine data, which means if you have a reference to the Task, you can easily in the debugger see the state associated with the async method. Previously you would need to use a tool like sos to find the async state machine object that referenced the relevant task.

One hopefully minor downside to the change is that the Task object returned from an async method is now larger than it used to be, with all of the state machine's state on it. Generally this won't matter, as you await a Task and then drop it, so the extra memory pressure doesn't exist for longer than it used to. However, if you happen to hold on to that task for a prolonged period of time, you'll now be keeping alive a larger object than you previously were.

There is also a very corner case change in behavior, which shouldn't break any real code, but does actually break one corefx test; there's an AsyncValueTaskMethodBuilder test I wrote, as part of trying to get to 100% code coverage, that explicitly passes the wrong state machine object to the builder's SetStateMachine method, and this change causes one of its asserts to fail (in an expected manner).

cc: @kouvel, @benaadams, @davidfowl, @ericeil, @MadsTorgersen

stephentoub · 2017-07-28T23:18:56Z

src/mscorlib/src/System/Runtime/CompilerServices/AsyncMethodBuilder.cs

@@ -575,10 +446,10 @@ public void SetResult(TResult result)
        {
            // Get the currently stored task, which will be non-null if get_Task has already been accessed.


I've gone back and forth on whether to explicitly zero out the StateMachine field here (and in SetException). On the one hand, it would potentially release any references held by the state machine object earlier than they'd otherwise be released. On the other hand, it would have a cost of zero'ing out the memory, and in most cases, the task object itself is about to get dropped, so it wouldn't actually help in most cases. Open to other opinions.

If you zero it out, that could impede any "interesting" state captured by the state machine in the debugger, right? Is that worth keeping around? Esp if an Exception happens, couldn't the state machine state could help?

It could, potentially.

benaadams · 2017-07-28T23:46:15Z

With the new design, those are eliminated, such that even if a non-default ExecutionContext is in play

You keep weakening my AysncLocal is evil generalization 😶

stephentoub · 2017-07-28T23:53:48Z

You keep weakening my AysncLocal is evil generalization

Mwahahaha

benaadams · 2017-07-29T19:49:39Z

As an aside...

// Get the currently stored task, which will be non-null if get_Task has already been accessed.

Fire-and-forget functions allocate if you catch assign the return Task.
If you don't compiler gets grumpy
If you async void; its a way round that. but not a good one.
#pragma works but is also ugly.

Just a gap worth thinking about.

e.g. maybe a lighter Task handle; that doesn't allocate until a property is inspected?

So if you did

var task0 = ThingAsync();
var task1 = ThingAsync();
var task2 = ThingAsync();
var task3 = ThingAsync();
var task4 = ThingAsync();
var task5 = ThingAsync();

// WhenAll inspects and allocates; 
// however may have been time for some of them to be regular pre-completed Tasks
await Task.WhenAll(task0, task1, task2, task3, task4);

Extra indirection might slow things down though. Just a thought...

stephentoub · 2017-07-31T23:11:17Z

@gregg-miskelly, I've been validating debugger behavior with these changes. I think I've fixed the issues I've found, with one exception. It looks like the debugger's async step-out might not be implemented as it was designed to be. The builders expose a SetNotificationForWaitCompletion method that the debugger is meant to use as part of the step-out implementation, but from observed behavior it looks like the debugger might be using "builder.Task.SetNotificationForWaitCompletion(bool)" instead of "builder.SetNotificationForWaitCompletion(bool)". Can you confirm? And if so, can we fix the debugger to use the method on the builder instead? If we don't make that change, this PR will break some step-out situations, and to fix that, I'd need to add non-trivial allocation overhead for a common non-debugger scenario.

redknightlois · 2017-08-01T23:15:53Z

Believe me when I tell you that we are VERY EAGER to get our hands on this. 5 out of 10 of our top allocators are async machinery related.
https://twitter.com/federicolois/status/892488991957819392

ayende · 2017-08-02T19:35:45Z

👍

stephentoub · 2017-08-18T15:30:09Z

@gregg-miskelly, thanks for the offline conversation. I fixed the async call stack issue by renaming back the helper type to the known AsyncMethodBuilderCore name the debugger cares about. Other than the change the debugger should make from using builder.Task.SetNotificationForWaitCompletion to builder.SetNotificationForWaitCompletion (without which stepping out of an async method before it reaches its first yield point won't work), I think that addresses debugging concerns, but please let me know if I've missed anything.

karelz · 2017-08-28T17:22:13Z

What is status of this PR? Are we blocked on @gregg-miskelly review?

stephentoub · 2017-08-28T17:25:03Z

What is status of this PR? Are we blocked on @gregg-miskelly review?

Yes, Gregg has been busy on some important work and I'm waiting for verification from him that allowing the debugger to prefer builder.SetNotificationForWaitCompletion() instead ofbuilder.Task.SetNotificationForWaitCompletion() isn't a problem.

stephentoub · 2017-09-14T05:19:52Z

@gregg-miskelly, have you been able to double-check this?

gregg-miskelly · 2017-09-14T05:22:52Z

@r-ramesh could you take a look since I keep failing to get to this?

benaadams · 2017-09-14T18:54:06Z

src/mscorlib/src/System/Runtime/CompilerServices/AsyncMethodBuilder.cs

@@ -721,10 +699,6 @@ internal void SetNotificationForWaitCompletion(bool enabled)
        [MethodImpl(MethodImplOptions.AggressiveInlining)] // method looks long, but for a given TResult it results in a relatively small amount of asm
        private Task<TResult> GetTaskForResult(TResult result)


Needs to change to internal static to not clash with #13907

nvm, merge will catch it

fixed the conflict

The first time a Task-based method yields, today there are four allocations: - The Task returned from the method - The state machine object boxed to the heap - An Action delegate that'll be passed to awaiters - A MoveNextRunner that stores state machine and the ExecutionContext, and has the method that the Action actually references For a simple async method, e.g. ```C# static async Task DoWorkAsync() { await Task.Yield(); } ``` when it yields the first time, we allocate four objects equaling 232 bytes (64-bit). This PR changes the scheme to use fewer allocations and less memory. With the new version, there are only two allocations: - A type derived from Task - An Action delegate that'll be passed to awaiters This doesn't obviate the need for the state machine, but rather than boxing the object normally, we simply store the state machine onto the Task-derived type, with the state machine strongly-typed in a property on the type. Further, the captured ExecutionContext is stored onto that same object, rather than requiring a separate MoveNextRunner to be allocated, and the delegate can point to that Task-derived type. This also makes the builder types thinner, and since the builders are stored in the state machine, that in turn makes the allocation smaller. With this new scheme and that same example from earlier, rather than costing 4 allocations and 232 bytes, it costs 2 allocations and 176 bytes. It also helps further in another common case. Previously the Task and state machine object would only be allocated once, but the Action and MoveNextRunner would be allocated and then could only be reused for subsequent awaits if the current ExecutionContext was the default. If, however, the current ExecutionContext was not the default, every await would end up allocating another Action and MoveNextRunner, for 2 allocations and 56 bytes on each await. With the new design, those are eliminated, such that even if a non-default ExecutionContext is in play, and even if it changes on between awaits, the original allocations are still used. There's also a small debugging benefit to this change: the resulting Task object now also contains the state machine data, which means if you have a reference to the Task, you can easily in the debugger see the state associated with the async method. Previously you would need to use a tool like sos to find the async state machine object that referenced the relevant task. One hopefully minor downside to the change is that the Task object returned from an async method is now larger than it used to be, with all of the state machine's state on it. Generally this won't matter, as you await a Task and then drop it, so the extra memory pressure doesn't exist for longer than it used to. However, if you happen to hold on to that task for a prolonged period of time, you'll now be keeping alive a larger object than you previously were, including any objects lifted "local" variables in the async method referenced. There is also a very corner case change in behavior: we no longer call SetStateMachine on the builder object. This was always infrastructure code and never meant to be used by end-user code directly. The implementation in .NET Native already doesn't call it.

benaadams · 2017-09-15T05:31:49Z

I'm very interested in this change, as this is a major area of allocations for us; and its far more allocations for anyone that uses AsyncLocal

Alas I can't get dotMemory to record when build coreclr for source atm, which is I think a different issue that's being picked up elsewhere.

As an alternative; using, using dotTrace timeline everything is dwarfed by the ETW allocations

7.07%   coreclr.dll  •  254 MB
  7.07%   WKS::gc_heap::fire_etw_allocation_event  •  254 MB  •  coreclr.dll.WKS::gc_heap::fire_etw_allocation_event
    7.07%   CoTemplate_qqhxpzqp  •  254 MB  •  coreclr.dll.CoTemplate_qqhxpzqp
      7.07%   EtwCallout  •  254 MB  •  coreclr.dll.EtwCallout
        7.07%   ETW::SamplingLog::SendStackTrace  •  254 MB  •  coreclr.dll.ETW::SamplingLog::SendStackTrace

So... bearing that in mind...

benaadams · 2017-09-15T05:36:48Z

Apologise for the format and non-interpreted results; will have something better in the morning.

coreclr.dll allocations are the ETW allocations

For roughly the same time span (75 secs) the before allocation tree from System.Net.WebSockets.ManagedWebSocket+ReceiveAsyncPrivate looks like the following

100%   MoveNext  •  4,050 MB  •  System.Net.WebSockets.ManagedWebSocket+<ReceiveAsyncPrivate>d__61.MoveNext()
  98.7%   SetExistingTaskResult  •  3,998 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
    98.7%   TrySetResult  •  3,998 MB  •  System.Threading.Tasks.Task`1.TrySetResult(!0)
      98.7%   RunContinuations  •  3,998 MB  •  System.Threading.Tasks.Task.RunContinuations(Object)
        98.7%   RunOrScheduleAction  •  3,998 MB  •  System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action, Boolean, ref Task)
          98.7%   Run  •  3,998 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
            98.7%   MoveNext  •  3,998 MB  •  AoA.Gaia.Network.PlayerWebsocket+<CompleteReceiveTaskAsync>d__56.MoveNext()
              98.7%   SetExistingTaskResult  •  3,998 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
                98.7%   TrySetResult  •  3,998 MB  •  System.Threading.Tasks.Task`1.TrySetResult(!0)
                  98.7%   RunContinuations  •  3,998 MB  •  System.Threading.Tasks.Task.RunContinuations(Object)
                    98.7%   RunOrScheduleAction  •  3,998 MB  •  System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action, Boolean, ref Task)
                      98.7%   Run  •  3,998 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                        98.7%   MoveNext  •  3,998 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveProcessAsync>d__53.MoveNext()
                          49.9%   ProcessReceive  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket.ProcessReceive(ArraySegment)
                            49.9%   PingInput  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket.PingInput(InboundUpdateType, Double, Int32, ArraySegment)
                              49.9%   Run  •  2,020 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                                49.9%   MoveNext  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket+<SendAsync>d__74.MoveNext()
                                  49.9%   CompleteSendAsync  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteSendAsync(WebSocketMessageType)
                                    49.9%   CompleteSend  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteSend(WebSocketMessageType)
                                      49.9%   SendAsync  •  2,020 MB  •  System.Net.WebSockets.ManagedWebSocket.SendAsync(ArraySegment, WebSocketMessageType, Boolean, CancellationToken)
                                        49.9%   SendFrameAsync  •  2,020 MB  •  System.Net.WebSockets.ManagedWebSocket.SendFrameAsync(MessageOpcode, Boolean, ArraySegment, CancellationToken)
                                          49.9%   SendFrameLockAcquiredNonCancelableAsync  •  2,020 MB  •  System.Net.WebSockets.ManagedWebSocket.SendFrameLockAcquiredNonCancelableAsync(MessageOpcode, Boolean, ArraySegment)
                                            49.9%   WriteAsync  •  2,020 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                              49.9%   WriteAsync  •  2,020 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameResponseStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                49.9%   WriteAsync  •  2,020 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Frame.WriteAsync(ArraySegment, CancellationToken)
                                                  49.9%   WriteAsync  •  2,020 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.OutputProducer.WriteAsync(ArraySegment, CancellationToken, Boolean)
                                                    30.1%   FlushAsync  •  1,220 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.OutputProducer.FlushAsync(WritableBuffer, CancellationToken)
                                                      30.1%   FlushAsync  •  1,220 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.FlushAsync(CancellationToken)
                                                        30.1%   Run  •  1,220 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                                                          30.1%   MoveNext  •  1,220 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.AdaptedPipeline+<WriteOutputAsync>d__12.MoveNext()
                                                            30.1%   WriteAsync  •  1,220 MB  •  System.IO.Stream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                              4.78%   WriteAsync  •  194 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.RawStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                                4.78%   MoveNext  •  194 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.RawStream+<WriteAsync>d__19.MoveNext()
                                                                  4.78%   Write  •  194 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.WritableBufferExtensions.Write(WritableBuffer, ReadOnlySpan)
                                                                    4.78%   Ensure  •  194 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Ensure(Int32)
                                                                      2.68%   coreclr.dll  •  108 MB
                                                                      2.04%   AllocateWriteHeadUnsynchronized  •  82 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.AllocateWriteHeadUnsynchronized(Int32)
                                                                        2.03%   coreclr.dll  •  82 MB
                                                                        0.00%   Lease  •  0.1 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.MemoryPool.Lease()
                                                                          0.00%   AllocateSlab  •  0.1 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.MemoryPool.AllocateSlab()
                                                                            0.00%   JIT_NewArr1  •  0.1 MB  •  coreclr.dll.JIT_NewArr1
                                                                      0.06%   Lease  •  2.6 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.MemoryPool.Lease()
                                                                        0.06%   AllocateSlab  •  2.6 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.MemoryPool.AllocateSlab()
                                                                          0.06%   JIT_NewArr1  •  2.5 MB  •  coreclr.dll.JIT_NewArr1
                                                                          0.00%   coreclr.dll  •  0.1 MB
                                                    19.8%   Alloc  •  800 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Alloc(Int32)
                                                      19.8%   AllocateWriteHeadUnsynchronized  •  800 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.AllocateWriteHeadUnsynchronized(Int32)
                                                        19.8%   coreclr.dll  •  800 MB
                          33.9%   InitiateReceive  •  1,374 MB  •  AoA.Gaia.Network.PlayerWebsocket.InitiateReceive()
                            33.9%   ReceiveAsync  •  1,374 MB  •  System.Net.WebSockets.ManagedWebSocket.ReceiveAsync(ArraySegment, CancellationToken)
                              33.9%   ReceiveAsyncPrivate  •  1,374 MB  •  System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate(ArraySegment, CancellationToken)
                                33.9%   MoveNext  •  1,374 MB  •  System.Net.WebSockets.ManagedWebSocket+<ReceiveAsyncPrivate>d__61.MoveNext()
                                ► 23.9%   EnsureBufferContainsAsync  •  969 MB  •  System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32, CancellationToken, Boolean)
                                ► 5.00%   coreclr.dll  •  203 MB
                                  2.99%   GetCompletionAction  •  121 MB  •  System.Runtime.CompilerServices.AsyncMethodBuilderCore.GetCompletionAction(Task, ref MoveNextRunner)
                                  2.02%   InitializeTask  •  82 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.InitializeTask()
                                  ► 2.02%   coreclr.dll  •  82 MB
                          14.9%   SetResult  •  604 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(Task)
                            14.9%   SetExistingTaskResult  •  604 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
                              14.9%   TrySetResult  •  604 MB  •  System.Threading.Tasks.Task`1.TrySetResult(!0)
                                14.9%   RunContinuations  •  604 MB  •  System.Threading.Tasks.Task.RunContinuations(Object)
                                  14.9%   RunOrScheduleAction  •  604 MB  •  System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action, Boolean, ref Task)
                                    14.9%   Run  •  604 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                                      14.9%   MoveNext  •  604 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveAsync>d__52.MoveNext()
                                        14.9%   ReceiveProcessAsync  •  604 MB  •  AoA.Gaia.Network.PlayerWebsocket.ReceiveProcessAsync()
                                          14.9%   MoveNext  •  604 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveProcessAsync>d__53.MoveNext()
                                          ► 7.59%   CompleteReceiveTaskAsync  •  307 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteReceiveTaskAsync(PooledBuffer)
                                            2.85%   GetCompletionAction  •  116 MB  •  System.Runtime.CompilerServices.AsyncMethodBuilderCore.GetCompletionAction(Task, ref MoveNextRunner)
                                            ► 2.85%   coreclr.dll  •  116 MB
                                            2.57%   coreclr.dll  •  104 MB
                                            1.90%   InitializeTask  •  77 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.InitializeTask()
                                            ► 1.90%   coreclr.dll  •  77 MB
► 1.23%   coreclr.dll  •  50 MB
  0.06%   EnsureBufferContainsAsync  •  2.4 MB  •  System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32, CancellationToken, Boolean)
    0.06%   MoveNext  •  2.4 MB  •  System.Net.WebSockets.ManagedWebSocket+<EnsureBufferContainsAsync>d__70.MoveNext()
    ► 0.03%   ReadAsync  •  1.1 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.ReadAsync(Byte[], Int32, Int32, CancellationToken)
    ► 0.02%   coreclr.dll  •  0.6 MB
      0.01%   InitializeTask  •  0.4 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.InitializeTask()
      0.01%   GetCompletionAction  •  0.3 MB  •  System.Runtime.CompilerServices.AsyncMethodBuilderCore.GetCompletionAction(Task, ref MoveNextRunner)
      ► 0.01%   coreclr.dll  •  0.3 MB

benaadams · 2017-09-15T05:38:31Z

For roughly the same time span (75 secs) the after allocation tree from System.Net.WebSockets.ManagedWebSocket+ReceiveAsyncPrivate looks like the following

100%   MoveNext  •  3,596 MB  •  System.Net.WebSockets.ManagedWebSocket+<ReceiveAsyncPrivate>d__61.MoveNext()
  98.8%   SetExistingTaskResult  •  3,554 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
    98.8%   TrySetResult  •  3,554 MB  •  System.Threading.Tasks.Task`1.TrySetResult(!0)
      98.8%   RunContinuations  •  3,554 MB  •  System.Threading.Tasks.Task.RunContinuations(Object)
        98.8%   RunOrScheduleAction  •  3,554 MB  •  System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action, Boolean, ref Task)
          98.8%   Run  •  3,554 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
            98.8%   MoveNext  •  3,554 MB  •  AoA.Gaia.Network.PlayerWebsocket+<CompleteReceiveTaskAsync>d__56.MoveNext()
              98.8%   SetExistingTaskResult  •  3,554 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
                98.8%   MoveNext  •  3,554 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveProcessAsync>d__53.MoveNext()
                  52.8%   ProcessReceive  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket.ProcessReceive(ArraySegment)
                    52.8%   PingInput  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket.PingInput(InboundUpdateType, Double, Int32, ArraySegment)
                      52.8%   Run  •  1,899 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                        52.8%   MoveNext  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket+<SendAsync>d__74.MoveNext()
                          52.8%   CompleteSendAsync  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteSendAsync(WebSocketMessageType)
                            52.8%   CompleteSend  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteSend(WebSocketMessageType)
                              52.8%   SendAsync  •  1,899 MB  •  System.Net.WebSockets.ManagedWebSocket.SendAsync(ArraySegment, WebSocketMessageType, Boolean, CancellationToken)
                                52.8%   SendFrameAsync  •  1,899 MB  •  System.Net.WebSockets.ManagedWebSocket.SendFrameAsync(MessageOpcode, Boolean, ArraySegment, CancellationToken)
                                  52.8%   SendFrameLockAcquiredNonCancelableAsync  •  1,899 MB  •  System.Net.WebSockets.ManagedWebSocket.SendFrameLockAcquiredNonCancelableAsync(MessageOpcode, Boolean, ArraySegment)
                                    52.8%   WriteAsync  •  1,899 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                      52.8%   WriteAsync  •  1,899 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameResponseStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                        52.8%   WriteAsync  •  1,899 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Frame.WriteAsync(ArraySegment, CancellationToken)
                                          52.8%   WriteAsync  •  1,899 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.OutputProducer.WriteAsync(ArraySegment, CancellationToken, Boolean)
                                            33.7%   FlushAsync  •  1,213 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.OutputProducer.FlushAsync(WritableBuffer, CancellationToken)
                                              33.7%   FlushAsync  •  1,213 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.FlushAsync(CancellationToken)
                                                33.7%   Run  •  1,213 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                                                  33.7%   MoveNext  •  1,213 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.AdaptedPipeline+<WriteOutputAsync>d__12.MoveNext()
                                                    33.7%   WriteAsync  •  1,213 MB  •  System.IO.Stream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                      4.88%   WriteAsync  •  176 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.RawStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                        4.88%   MoveNext  •  176 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.RawStream+<WriteAsync>d__19.MoveNext()
                                                          4.88%   Write  •  176 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.WritableBufferExtensions.Write(WritableBuffer, ReadOnlySpan)
                                                            4.88%   Ensure  •  176 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Ensure(Int32)
                                                              2.61%   coreclr.dll  •  94 MB
                                                              2.27%   AllocateWriteHeadUnsynchronized  •  82 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.AllocateWriteHeadUnsynchronized(Int32)
                                                              ► 2.27%   coreclr.dll  •  82 MB
                                            19.1%   Alloc  •  687 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Alloc(Int32)
                                              19.1%   AllocateWriteHeadUnsynchronized  •  687 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.AllocateWriteHeadUnsynchronized(Int32)
                                                19.1%   coreclr.dll  •  687 MB
                  32.3%   InitiateReceive  •  1,161 MB  •  AoA.Gaia.Network.PlayerWebsocket.InitiateReceive()
                    32.3%   ReceiveAsync  •  1,161 MB  •  System.Net.WebSockets.ManagedWebSocket.ReceiveAsync(ArraySegment, CancellationToken)
                      32.3%   ReceiveAsyncPrivate  •  1,161 MB  •  System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate(ArraySegment, CancellationToken)
                        32.3%   MoveNext  •  1,161 MB  •  System.Net.WebSockets.ManagedWebSocket+<ReceiveAsyncPrivate>d__61.MoveNext()
                          22.5%   EnsureBufferContainsAsync  •  808 MB  •  System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32, CancellationToken, Boolean)
                            22.5%   MoveNext  •  808 MB  •  System.Net.WebSockets.ManagedWebSocket+<EnsureBufferContainsAsync>d__70.MoveNext()
                              14.6%   ReadAsync  •  526 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.ReadAsync(Byte[], Int32, Int32, CancellationToken)
                                14.6%   ReadAsync  •  526 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameRequestStream.ReadAsync(Byte[], Int32, Int32, CancellationToken)
                                  14.6%   ReadAsyncInternal  •  526 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameRequestStream.ReadAsyncInternal(Byte[], Int32, Int32, CancellationToken)
                                    14.6%   MoveNext  •  526 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameRequestStream+<ReadAsyncInternal>d__21.MoveNext()
                                    ► 7.57%   coreclr.dll  •  272 MB
                                      7.07%   ReadAsync  •  254 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.ReadAsync(ArraySegment, CancellationToken)
                                        7.07%   MoveNext  •  254 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody+<ReadAsync>d__22.MoveNext()
                                        ► 7.07%   coreclr.dll  •  254 MB
                            ► 7.82%   coreclr.dll  •  281 MB
                        ► 9.83%   coreclr.dll  •  354 MB
                  13.7%   SetExistingTaskResult  •  493 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
                    13.7%   MoveNext  •  493 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveAsync>d__52.MoveNext()
                      13.7%   ReceiveProcessAsync  •  493 MB  •  AoA.Gaia.Network.PlayerWebsocket.ReceiveProcessAsync()
                        13.7%   MoveNext  •  493 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveProcessAsync>d__53.MoveNext()
                        ► 7.60%   CompleteReceiveTaskAsync  •  273 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteReceiveTaskAsync(PooledBuffer)
                        ► 6.11%   coreclr.dll  •  220 MB
► 1.12%   coreclr.dll  •  40 MB
  0.06%   EnsureBufferContainsAsync  •  2.1 MB  •  System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32, CancellationToken, Boolean)
    0.06%   MoveNext  •  2.1 MB  •  System.Net.WebSockets.ManagedWebSocket+<EnsureBufferContainsAsync>d__70.MoveNext()
    ► 0.04%   ReadAsync  •  1.6 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.ReadAsync(Byte[], Int32, Int32, CancellationToken)
    ► 0.01%   coreclr.dll  •  0.5 MB

benaadams · 2017-09-15T05:43:58Z

Its down 500MB; but that might just be what was going on at the time

But I thought I'd give a preview in case @stephentoub case see if the paths are doing what he expected. Will look deeper in the morning.

benaadams · 2017-09-15T05:52:39Z

I'll look at what happens if AsyncLocal is used, as that moves everything off the default context and currently allocations skyrocket

benaadams · 2017-09-15T06:02:52Z

If, however, the current ExecutionContext was not the default, every await would end up allocating another Action and MoveNextRunner, for 2 allocations and 56 bytes on each await. With the new design, those are eliminated, such that even if a non-default ExecutionContext is in play, and even if it changes in between awaits, the original allocations are still used.

For reference to the AsyncLocal and non-default context

Without AsyncLocal

With AsyncLocal

You can see the allocations of Action and MoveNextRunner disproportionately increase; prior to this change

stephentoub · 2017-09-15T11:32:05Z

Its down 500MB; but that might just be what was going on at the time

I can't tell from the shared output: how many objects were there contributing to that? I'd guestimate you would see a 50-60 byte savings per async call that completes asynchronously and that uses the default context (and potentially much, much more savings in a non-default context).

stephentoub · 2017-09-21T13:07:21Z

@r-ramesh, @gregg-miskelly, I would like to merge this. Can I go ahead and do so?

r-ramesh · 2017-09-21T17:35:55Z

@stephentoub Sorry for the delay, let me take a quick look at the debugger implementation. I will keep you updated.

r-ramesh · 2017-09-21T20:32:59Z

Looks good to me.

stephentoub · 2017-09-22T01:49:02Z

Thanks, @r-ramesh.

stephentoub added * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) tenet-performance Performance related issue labels Jul 28, 2017

dnfclas added the cla-already-signed label Jul 28, 2017

stephentoub commented Jul 28, 2017

View reviewed changes

stephentoub force-pushed the asyncmem branch from 4cbbab4 to b989469 Compare July 31, 2017 02:01

stephentoub added * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) and removed * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) labels Jul 31, 2017

stephentoub force-pushed the asyncmem branch from b989469 to a33b2b1 Compare July 31, 2017 23:07

stephentoub removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 2, 2017

kouvel approved these changes Aug 8, 2017

View reviewed changes

stephentoub mentioned this pull request Aug 14, 2017

avoid async overhead in ReadNextLineAsync when possible dotnet/corefx#23210

Closed

adityamandaleeka mentioned this pull request Aug 17, 2017

Delete bad test from AsyncValueTaskMethodBuilderTests dotnet/corefx#23356

Merged

stephentoub force-pushed the asyncmem branch from a33b2b1 to c3e6831 Compare August 18, 2017 14:28

stephentoub mentioned this pull request Aug 18, 2017

Remove overhead from AsyncValueTaskMethodBuilder.Create #13390

Merged

karelz added the area-System.Runtime label Aug 28, 2017

karelz assigned stephentoub Aug 28, 2017

stephentoub force-pushed the asyncmem branch from c3e6831 to 63d1932 Compare September 10, 2017 11:44

benaadams reviewed Sep 14, 2017

View reviewed changes

stephentoub force-pushed the asyncmem branch from 63d1932 to d56eff4 Compare September 14, 2017 21:46

stephentoub merged commit 8c33e99 into dotnet:master Sep 22, 2017

stephentoub deleted the asyncmem branch September 22, 2017 01:49

stephentoub mentioned this pull request Sep 25, 2017

Avoid async method delegate allocation #14178

Merged

stephentoub added the netfx-port-consider label Oct 17, 2017

danielmarbach mentioned this pull request Dec 7, 2017

Avoid Task.Run and Factory.StartNew with async Azure/azure-storage-net#581

Merged

stephentoub mentioned this pull request May 29, 2018

Add sos DumpAsync command #18160

Merged

davidfowl mentioned this pull request May 30, 2018

HttpUpgradeStream doesn't have overloads for Memory<byte> aspnet/KestrelHttpServer#2620

Closed

benaadams mentioned this pull request Aug 23, 2019

Make "async ValueTask/ValueTask<T>" methods ammortized allocation-free #26310

Merged

kriskalish mentioned this pull request Jan 31, 2020

Async Code Leaks in 2.1 Runtime, but Not 2.0 Runtime dotnet/runtime#11189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations when async methods yield #13105

Reduce allocations when async methods yield #13105

stephentoub commented Jul 28, 2017 •

edited

Loading

stephentoub Jul 28, 2017 •

edited

Loading

clairernovotny Jul 29, 2017

stephentoub Jul 29, 2017

benaadams commented Jul 28, 2017

stephentoub commented Jul 28, 2017

benaadams commented Jul 29, 2017 •

edited

Loading

stephentoub commented Jul 31, 2017

redknightlois commented Aug 1, 2017

ayende commented Aug 2, 2017

stephentoub commented Aug 18, 2017

karelz commented Aug 28, 2017

stephentoub commented Aug 28, 2017

stephentoub commented Sep 14, 2017

gregg-miskelly commented Sep 14, 2017

benaadams Sep 14, 2017

benaadams Sep 14, 2017

stephentoub Sep 14, 2017

benaadams commented Sep 15, 2017 •

edited

Loading

benaadams commented Sep 15, 2017 •

edited

Loading

benaadams commented Sep 15, 2017 •

edited

Loading

benaadams commented Sep 15, 2017

benaadams commented Sep 15, 2017

benaadams commented Sep 15, 2017

stephentoub commented Sep 15, 2017

stephentoub commented Sep 21, 2017

r-ramesh commented Sep 21, 2017

r-ramesh commented Sep 21, 2017

stephentoub commented Sep 22, 2017

		@@ -575,10 +446,10 @@ public void SetResult(TResult result)
		{
		// Get the currently stored task, which will be non-null if get_Task has already been accessed.

		@@ -721,10 +699,6 @@ internal void SetNotificationForWaitCompletion(bool enabled)
		[MethodImpl(MethodImplOptions.AggressiveInlining)] // method looks long, but for a given TResult it results in a relatively small amount of asm
		private Task<TResult> GetTaskForResult(TResult result)

Reduce allocations when async methods yield #13105

Reduce allocations when async methods yield #13105

Conversation

stephentoub commented Jul 28, 2017 • edited Loading

stephentoub Jul 28, 2017 • edited Loading

Choose a reason for hiding this comment

clairernovotny Jul 29, 2017

Choose a reason for hiding this comment

stephentoub Jul 29, 2017

Choose a reason for hiding this comment

benaadams commented Jul 28, 2017

stephentoub commented Jul 28, 2017

benaadams commented Jul 29, 2017 • edited Loading

stephentoub commented Jul 31, 2017

redknightlois commented Aug 1, 2017

ayende commented Aug 2, 2017

stephentoub commented Aug 18, 2017

karelz commented Aug 28, 2017

stephentoub commented Aug 28, 2017

stephentoub commented Sep 14, 2017

gregg-miskelly commented Sep 14, 2017

benaadams Sep 14, 2017

Choose a reason for hiding this comment

benaadams Sep 14, 2017

Choose a reason for hiding this comment

stephentoub Sep 14, 2017

Choose a reason for hiding this comment

benaadams commented Sep 15, 2017 • edited Loading

benaadams commented Sep 15, 2017 • edited Loading

benaadams commented Sep 15, 2017 • edited Loading

benaadams commented Sep 15, 2017

benaadams commented Sep 15, 2017

benaadams commented Sep 15, 2017

stephentoub commented Sep 15, 2017

stephentoub commented Sep 21, 2017

r-ramesh commented Sep 21, 2017

r-ramesh commented Sep 21, 2017

stephentoub commented Sep 22, 2017

stephentoub commented Jul 28, 2017 •

edited

Loading

stephentoub Jul 28, 2017 •

edited

Loading

benaadams commented Jul 29, 2017 •

edited

Loading

benaadams commented Sep 15, 2017 •

edited

Loading

benaadams commented Sep 15, 2017 •

edited

Loading

benaadams commented Sep 15, 2017 •

edited

Loading