Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Reduce allocations when async methods yield #13105

Merged
merged 1 commit into from
Sep 22, 2017

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jul 28, 2017

The first time a Task-based async method yields, today there are four allocations:

  • The Task returned from the method
  • The state machine object boxed to the heap
  • An Action delegate that'll be passed to awaiters
  • A MoveNextRunner that stores state machine and the ExecutionContext, and has the method that the Action actually references

For a simple async method, e.g.

static async Task DoWorkAsync()
{
    await Task.Yield();
}

when it yields the first time, we allocate four objects equaling 232 bytes (64-bit).

This PR changes the scheme to use fewer allocations and less memory. With the new version, there are only two allocations:

  • A type derived from Task
  • An Action delegate that'll be passed to awaiters

This doesn't obviate the need for the state machine, but rather than boxing the object normally, we simply store the state machine onto the Task-derived type, which itself implements IAsyncStateMachine. Further, the captured ExecutionContext is stored onto that same object, rather than requiring a separate MoveNextRunner to be allocated, and the delegate can point to that Task-derived type. With this new scheme and that same example from earlier, rather than costing 4 allocations and 232 bytes, it costs 2 allocations and 176 bytes, so 50% fewer allocations and 25% less allocated memory.

It also helps further in another common case. Previously the Task and state machine object would only be allocated once, but the Action and MoveNextRunner would be allocated and then could only be reused for subsequent awaits if the current ExecutionContext was the default. If, however, the current ExecutionContext was not the default, every await would end up allocating another Action and MoveNextRunner, for 2 allocations and 56 bytes on each await. With the new design, those are eliminated, such that even if a non-default ExecutionContext is in play, and even if it changes in between awaits, the original allocations are still used.

There's also a small debugging benefit to this change: the resulting Task object now also contains the state machine data, which means if you have a reference to the Task, you can easily in the debugger see the state associated with the async method. Previously you would need to use a tool like sos to find the async state machine object that referenced the relevant task.

One hopefully minor downside to the change is that the Task object returned from an async method is now larger than it used to be, with all of the state machine's state on it. Generally this won't matter, as you await a Task and then drop it, so the extra memory pressure doesn't exist for longer than it used to. However, if you happen to hold on to that task for a prolonged period of time, you'll now be keeping alive a larger object than you previously were.

There is also a very corner case change in behavior, which shouldn't break any real code, but does actually break one corefx test; there's an AsyncValueTaskMethodBuilder test I wrote, as part of trying to get to 100% code coverage, that explicitly passes the wrong state machine object to the builder's SetStateMachine method, and this change causes one of its asserts to fail (in an expected manner).

cc: @kouvel, @benaadams, @davidfowl, @ericeil, @MadsTorgersen

@stephentoub stephentoub added * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) tenet-performance Performance related issue labels Jul 28, 2017
@@ -575,10 +446,10 @@ public void SetResult(TResult result)
{
// Get the currently stored task, which will be non-null if get_Task has already been accessed.
Copy link
Member Author

@stephentoub stephentoub Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back and forth on whether to explicitly zero out the StateMachine field here (and in SetException). On the one hand, it would potentially release any references held by the state machine object earlier than they'd otherwise be released. On the other hand, it would have a cost of zero'ing out the memory, and in most cases, the task object itself is about to get dropped, so it wouldn't actually help in most cases. Open to other opinions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you zero it out, that could impede any "interesting" state captured by the state machine in the debugger, right? Is that worth keeping around? Esp if an Exception happens, couldn't the state machine state could help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, potentially.

@benaadams
Copy link
Member

With the new design, those are eliminated, such that even if a non-default ExecutionContext is in play

You keep weakening my AysncLocal is evil generalization 😶

@stephentoub
Copy link
Member Author

You keep weakening my AysncLocal is evil generalization

Mwahahaha

@benaadams
Copy link
Member

benaadams commented Jul 29, 2017

As an aside...

// Get the currently stored task, which will be non-null if get_Task has already been accessed.

Fire-and-forget functions allocate if you catch assign the return Task.
If you don't compiler gets grumpy
If you async void; its a way round that. but not a good one.
#pragma works but is also ugly.

Just a gap worth thinking about.

e.g. maybe a lighter Task handle; that doesn't allocate until a property is inspected?

So if you did

var task0 = ThingAsync();
var task1 = ThingAsync();
var task2 = ThingAsync();
var task3 = ThingAsync();
var task4 = ThingAsync();
var task5 = ThingAsync();

// WhenAll inspects and allocates; 
// however may have been time for some of them to be regular pre-completed Tasks
await Task.WhenAll(task0, task1, task2, task3, task4);

Extra indirection might slow things down though. Just a thought...

@stephentoub stephentoub added * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) and removed * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) labels Jul 31, 2017
@stephentoub
Copy link
Member Author

@gregg-miskelly, I've been validating debugger behavior with these changes. I think I've fixed the issues I've found, with one exception. It looks like the debugger's async step-out might not be implemented as it was designed to be. The builders expose a SetNotificationForWaitCompletion method that the debugger is meant to use as part of the step-out implementation, but from observed behavior it looks like the debugger might be using "builder.Task.SetNotificationForWaitCompletion(bool)" instead of "builder.SetNotificationForWaitCompletion(bool)". Can you confirm? And if so, can we fix the debugger to use the method on the builder instead? If we don't make that change, this PR will break some step-out situations, and to fix that, I'd need to add non-trivial allocation overhead for a common non-debugger scenario.

@redknightlois
Copy link

Believe me when I tell you that we are VERY EAGER to get our hands on this. 5 out of 10 of our top allocators are async machinery related.
https://twitter.com/federicolois/status/892488991957819392

@ayende
Copy link

ayende commented Aug 2, 2017

👍

@stephentoub stephentoub removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 2, 2017
@stephentoub
Copy link
Member Author

@gregg-miskelly, thanks for the offline conversation. I fixed the async call stack issue by renaming back the helper type to the known AsyncMethodBuilderCore name the debugger cares about. Other than the change the debugger should make from using builder.Task.SetNotificationForWaitCompletion to builder.SetNotificationForWaitCompletion (without which stepping out of an async method before it reaches its first yield point won't work), I think that addresses debugging concerns, but please let me know if I've missed anything.

@karelz
Copy link
Member

karelz commented Aug 28, 2017

What is status of this PR? Are we blocked on @gregg-miskelly review?

@stephentoub
Copy link
Member Author

What is status of this PR? Are we blocked on @gregg-miskelly review?

Yes, Gregg has been busy on some important work and I'm waiting for verification from him that allowing the debugger to prefer builder.SetNotificationForWaitCompletion() instead ofbuilder.Task.SetNotificationForWaitCompletion() isn't a problem.

@stephentoub
Copy link
Member Author

@gregg-miskelly, have you been able to double-check this?

@gregg-miskelly
Copy link

@r-ramesh could you take a look since I keep failing to get to this?

@@ -721,10 +699,6 @@ internal void SetNotificationForWaitCompletion(bool enabled)
[MethodImpl(MethodImplOptions.AggressiveInlining)] // method looks long, but for a given TResult it results in a relatively small amount of asm
private Task<TResult> GetTaskForResult(TResult result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to change to internal static to not clash with #13907

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, merge will catch it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the conflict

The first time a Task-based method yields, today there are four allocations:
- The Task returned from the method
- The state machine object boxed to the heap
- An Action delegate that'll be passed to awaiters
- A MoveNextRunner that stores state machine and the ExecutionContext, and has the method that the Action actually references

For a simple async method, e.g.
```C#
static async Task DoWorkAsync()
{
    await Task.Yield();
}
```
when it yields the first time, we allocate four objects equaling 232 bytes (64-bit).

This PR changes the scheme to use fewer allocations and less memory.  With the new version, there are only two allocations:
- A type derived from Task
- An Action delegate that'll be passed to awaiters
This doesn't obviate the need for the state machine, but rather than boxing the object normally, we simply store the state machine onto the Task-derived type, with the state machine strongly-typed in a property on the type.  Further, the captured ExecutionContext is stored onto that same object, rather than requiring a separate MoveNextRunner to be allocated, and the delegate can point to that Task-derived type. This also makes the builder types thinner, and since the builders are stored in the state machine, that in turn makes the allocation smaller.  With this new scheme and that same example from earlier, rather than costing 4 allocations and 232 bytes, it costs 2 allocations and 176 bytes.

It also helps further in another common case.  Previously the Task and state machine object would only be allocated once, but the Action and MoveNextRunner would be allocated and then could only be reused for subsequent awaits if the current ExecutionContext was the default.  If, however, the current ExecutionContext was not the default, every await would end up allocating another Action and MoveNextRunner, for 2 allocations and 56 bytes on each await.  With the new design, those are eliminated, such that even if a non-default ExecutionContext is in play, and even if it changes on between awaits, the original allocations are still used.

There's also a small debugging benefit to this change: the resulting Task object now also contains the state machine data, which means if you have a reference to the Task, you can easily in the debugger see the state associated with the async method.  Previously you would need to use a tool like sos to find the async state machine object that referenced the relevant task.

One hopefully minor downside to the change is that the Task object returned from an async method is now larger than it used to be, with all of the state machine's state on it.  Generally this won't matter, as you await a Task and then drop it, so the extra memory pressure doesn't exist for longer than it used to.  However, if you happen to hold on to that task for a prolonged period of time, you'll now be keeping alive a larger object than you previously were, including any objects lifted "local" variables in the async method referenced.

There is also a very corner case change in behavior: we no longer call SetStateMachine on the builder object.  This was always infrastructure code and never meant to be used by end-user code directly.  The implementation in .NET Native already doesn't call it.
@benaadams
Copy link
Member

benaadams commented Sep 15, 2017

I'm very interested in this change, as this is a major area of allocations for us; and its far more allocations for anyone that uses AsyncLocal

Alas I can't get dotMemory to record when build coreclr for source atm, which is I think a different issue that's being picked up elsewhere.

As an alternative; using, using dotTrace timeline everything is dwarfed by the ETW allocations

7.07%   coreclr.dll  •  254 MB
  7.07%   WKS::gc_heap::fire_etw_allocation_event  •  254 MB  •  coreclr.dll.WKS::gc_heap::fire_etw_allocation_event
    7.07%   CoTemplate_qqhxpzqp  •  254 MB  •  coreclr.dll.CoTemplate_qqhxpzqp
      7.07%   EtwCallout  •  254 MB  •  coreclr.dll.EtwCallout
        7.07%   ETW::SamplingLog::SendStackTrace  •  254 MB  •  coreclr.dll.ETW::SamplingLog::SendStackTrace

So... bearing that in mind...

@benaadams
Copy link
Member

benaadams commented Sep 15, 2017

Apologise for the format and non-interpreted results; will have something better in the morning.

coreclr.dll allocations are the ETW allocations

For roughly the same time span (75 secs) the before allocation tree from System.Net.WebSockets.ManagedWebSocket+ReceiveAsyncPrivate looks like the following

100%   MoveNext  •  4,050 MB  •  System.Net.WebSockets.ManagedWebSocket+<ReceiveAsyncPrivate>d__61.MoveNext()
  98.7%   SetExistingTaskResult  •  3,998 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
    98.7%   TrySetResult  •  3,998 MB  •  System.Threading.Tasks.Task`1.TrySetResult(!0)
      98.7%   RunContinuations  •  3,998 MB  •  System.Threading.Tasks.Task.RunContinuations(Object)
        98.7%   RunOrScheduleAction  •  3,998 MB  •  System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action, Boolean, ref Task)
          98.7%   Run  •  3,998 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
            98.7%   MoveNext  •  3,998 MB  •  AoA.Gaia.Network.PlayerWebsocket+<CompleteReceiveTaskAsync>d__56.MoveNext()
              98.7%   SetExistingTaskResult  •  3,998 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
                98.7%   TrySetResult  •  3,998 MB  •  System.Threading.Tasks.Task`1.TrySetResult(!0)
                  98.7%   RunContinuations  •  3,998 MB  •  System.Threading.Tasks.Task.RunContinuations(Object)
                    98.7%   RunOrScheduleAction  •  3,998 MB  •  System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action, Boolean, ref Task)
                      98.7%   Run  •  3,998 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                        98.7%   MoveNext  •  3,998 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveProcessAsync>d__53.MoveNext()
                          49.9%   ProcessReceive  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket.ProcessReceive(ArraySegment)
                            49.9%   PingInput  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket.PingInput(InboundUpdateType, Double, Int32, ArraySegment)
                              49.9%   Run  •  2,020 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                                49.9%   MoveNext  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket+<SendAsync>d__74.MoveNext()
                                  49.9%   CompleteSendAsync  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteSendAsync(WebSocketMessageType)
                                    49.9%   CompleteSend  •  2,020 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteSend(WebSocketMessageType)
                                      49.9%   SendAsync  •  2,020 MB  •  System.Net.WebSockets.ManagedWebSocket.SendAsync(ArraySegment, WebSocketMessageType, Boolean, CancellationToken)
                                        49.9%   SendFrameAsync  •  2,020 MB  •  System.Net.WebSockets.ManagedWebSocket.SendFrameAsync(MessageOpcode, Boolean, ArraySegment, CancellationToken)
                                          49.9%   SendFrameLockAcquiredNonCancelableAsync  •  2,020 MB  •  System.Net.WebSockets.ManagedWebSocket.SendFrameLockAcquiredNonCancelableAsync(MessageOpcode, Boolean, ArraySegment)
                                            49.9%   WriteAsync  •  2,020 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                              49.9%   WriteAsync  •  2,020 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameResponseStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                49.9%   WriteAsync  •  2,020 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Frame.WriteAsync(ArraySegment, CancellationToken)
                                                  49.9%   WriteAsync  •  2,020 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.OutputProducer.WriteAsync(ArraySegment, CancellationToken, Boolean)
                                                    30.1%   FlushAsync  •  1,220 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.OutputProducer.FlushAsync(WritableBuffer, CancellationToken)
                                                      30.1%   FlushAsync  •  1,220 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.FlushAsync(CancellationToken)
                                                        30.1%   Run  •  1,220 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                                                          30.1%   MoveNext  •  1,220 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.AdaptedPipeline+<WriteOutputAsync>d__12.MoveNext()
                                                            30.1%   WriteAsync  •  1,220 MB  •  System.IO.Stream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                              4.78%   WriteAsync  •  194 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.RawStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                                4.78%   MoveNext  •  194 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.RawStream+<WriteAsync>d__19.MoveNext()
                                                                  4.78%   Write  •  194 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.WritableBufferExtensions.Write(WritableBuffer, ReadOnlySpan)
                                                                    4.78%   Ensure  •  194 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Ensure(Int32)
                                                                      2.68%   coreclr.dll  •  108 MB
                                                                      2.04%   AllocateWriteHeadUnsynchronized  •  82 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.AllocateWriteHeadUnsynchronized(Int32)
                                                                        2.03%   coreclr.dll  •  82 MB
                                                                        0.00%   Lease  •  0.1 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.MemoryPool.Lease()
                                                                          0.00%   AllocateSlab  •  0.1 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.MemoryPool.AllocateSlab()
                                                                            0.00%   JIT_NewArr1  •  0.1 MB  •  coreclr.dll.JIT_NewArr1
                                                                      0.06%   Lease  •  2.6 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.MemoryPool.Lease()
                                                                        0.06%   AllocateSlab  •  2.6 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.MemoryPool.AllocateSlab()
                                                                          0.06%   JIT_NewArr1  •  2.5 MB  •  coreclr.dll.JIT_NewArr1
                                                                          0.00%   coreclr.dll  •  0.1 MB
                                                    19.8%   Alloc  •  800 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Alloc(Int32)
                                                      19.8%   AllocateWriteHeadUnsynchronized  •  800 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.AllocateWriteHeadUnsynchronized(Int32)
                                                        19.8%   coreclr.dll  •  800 MB
                          33.9%   InitiateReceive  •  1,374 MB  •  AoA.Gaia.Network.PlayerWebsocket.InitiateReceive()
                            33.9%   ReceiveAsync  •  1,374 MB  •  System.Net.WebSockets.ManagedWebSocket.ReceiveAsync(ArraySegment, CancellationToken)
                              33.9%   ReceiveAsyncPrivate  •  1,374 MB  •  System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate(ArraySegment, CancellationToken)
                                33.9%   MoveNext  •  1,374 MB  •  System.Net.WebSockets.ManagedWebSocket+<ReceiveAsyncPrivate>d__61.MoveNext()
                                ► 23.9%   EnsureBufferContainsAsync  •  969 MB  •  System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32, CancellationToken, Boolean)
                                ► 5.00%   coreclr.dll  •  203 MB
                                  2.99%   GetCompletionAction  •  121 MB  •  System.Runtime.CompilerServices.AsyncMethodBuilderCore.GetCompletionAction(Task, ref MoveNextRunner)
                                  2.02%   InitializeTask  •  82 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.InitializeTask()
                                  ► 2.02%   coreclr.dll  •  82 MB
                          14.9%   SetResult  •  604 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(Task)
                            14.9%   SetExistingTaskResult  •  604 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
                              14.9%   TrySetResult  •  604 MB  •  System.Threading.Tasks.Task`1.TrySetResult(!0)
                                14.9%   RunContinuations  •  604 MB  •  System.Threading.Tasks.Task.RunContinuations(Object)
                                  14.9%   RunOrScheduleAction  •  604 MB  •  System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action, Boolean, ref Task)
                                    14.9%   Run  •  604 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                                      14.9%   MoveNext  •  604 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveAsync>d__52.MoveNext()
                                        14.9%   ReceiveProcessAsync  •  604 MB  •  AoA.Gaia.Network.PlayerWebsocket.ReceiveProcessAsync()
                                          14.9%   MoveNext  •  604 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveProcessAsync>d__53.MoveNext()
                                          ► 7.59%   CompleteReceiveTaskAsync  •  307 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteReceiveTaskAsync(PooledBuffer)
                                            2.85%   GetCompletionAction  •  116 MB  •  System.Runtime.CompilerServices.AsyncMethodBuilderCore.GetCompletionAction(Task, ref MoveNextRunner)
                                            ► 2.85%   coreclr.dll  •  116 MB
                                            2.57%   coreclr.dll  •  104 MB
                                            1.90%   InitializeTask  •  77 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.InitializeTask()
                                            ► 1.90%   coreclr.dll  •  77 MB
► 1.23%   coreclr.dll  •  50 MB
  0.06%   EnsureBufferContainsAsync  •  2.4 MB  •  System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32, CancellationToken, Boolean)
    0.06%   MoveNext  •  2.4 MB  •  System.Net.WebSockets.ManagedWebSocket+<EnsureBufferContainsAsync>d__70.MoveNext()
    ► 0.03%   ReadAsync  •  1.1 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.ReadAsync(Byte[], Int32, Int32, CancellationToken)
    ► 0.02%   coreclr.dll  •  0.6 MB
      0.01%   InitializeTask  •  0.4 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.InitializeTask()
      0.01%   GetCompletionAction  •  0.3 MB  •  System.Runtime.CompilerServices.AsyncMethodBuilderCore.GetCompletionAction(Task, ref MoveNextRunner)
      ► 0.01%   coreclr.dll  •  0.3 MB

@benaadams
Copy link
Member

benaadams commented Sep 15, 2017

For roughly the same time span (75 secs) the after allocation tree from System.Net.WebSockets.ManagedWebSocket+ReceiveAsyncPrivate looks like the following

100%   MoveNext  •  3,596 MB  •  System.Net.WebSockets.ManagedWebSocket+<ReceiveAsyncPrivate>d__61.MoveNext()
  98.8%   SetExistingTaskResult  •  3,554 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
    98.8%   TrySetResult  •  3,554 MB  •  System.Threading.Tasks.Task`1.TrySetResult(!0)
      98.8%   RunContinuations  •  3,554 MB  •  System.Threading.Tasks.Task.RunContinuations(Object)
        98.8%   RunOrScheduleAction  •  3,554 MB  •  System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action, Boolean, ref Task)
          98.8%   Run  •  3,554 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
            98.8%   MoveNext  •  3,554 MB  •  AoA.Gaia.Network.PlayerWebsocket+<CompleteReceiveTaskAsync>d__56.MoveNext()
              98.8%   SetExistingTaskResult  •  3,554 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
                98.8%   MoveNext  •  3,554 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveProcessAsync>d__53.MoveNext()
                  52.8%   ProcessReceive  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket.ProcessReceive(ArraySegment)
                    52.8%   PingInput  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket.PingInput(InboundUpdateType, Double, Int32, ArraySegment)
                      52.8%   Run  •  1,899 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                        52.8%   MoveNext  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket+<SendAsync>d__74.MoveNext()
                          52.8%   CompleteSendAsync  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteSendAsync(WebSocketMessageType)
                            52.8%   CompleteSend  •  1,899 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteSend(WebSocketMessageType)
                              52.8%   SendAsync  •  1,899 MB  •  System.Net.WebSockets.ManagedWebSocket.SendAsync(ArraySegment, WebSocketMessageType, Boolean, CancellationToken)
                                52.8%   SendFrameAsync  •  1,899 MB  •  System.Net.WebSockets.ManagedWebSocket.SendFrameAsync(MessageOpcode, Boolean, ArraySegment, CancellationToken)
                                  52.8%   SendFrameLockAcquiredNonCancelableAsync  •  1,899 MB  •  System.Net.WebSockets.ManagedWebSocket.SendFrameLockAcquiredNonCancelableAsync(MessageOpcode, Boolean, ArraySegment)
                                    52.8%   WriteAsync  •  1,899 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                      52.8%   WriteAsync  •  1,899 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameResponseStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                        52.8%   WriteAsync  •  1,899 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Frame.WriteAsync(ArraySegment, CancellationToken)
                                          52.8%   WriteAsync  •  1,899 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.OutputProducer.WriteAsync(ArraySegment, CancellationToken, Boolean)
                                            33.7%   FlushAsync  •  1,213 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.OutputProducer.FlushAsync(WritableBuffer, CancellationToken)
                                              33.7%   FlushAsync  •  1,213 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.FlushAsync(CancellationToken)
                                                33.7%   Run  •  1,213 MB  •  System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object)
                                                  33.7%   MoveNext  •  1,213 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.AdaptedPipeline+<WriteOutputAsync>d__12.MoveNext()
                                                    33.7%   WriteAsync  •  1,213 MB  •  System.IO.Stream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                      4.88%   WriteAsync  •  176 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.RawStream.WriteAsync(Byte[], Int32, Int32, CancellationToken)
                                                        4.88%   MoveNext  •  176 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Adapter.Internal.RawStream+<WriteAsync>d__19.MoveNext()
                                                          4.88%   Write  •  176 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.WritableBufferExtensions.Write(WritableBuffer, ReadOnlySpan)
                                                            4.88%   Ensure  •  176 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Ensure(Int32)
                                                              2.61%   coreclr.dll  •  94 MB
                                                              2.27%   AllocateWriteHeadUnsynchronized  •  82 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.AllocateWriteHeadUnsynchronized(Int32)
                                                              ► 2.27%   coreclr.dll  •  82 MB
                                            19.1%   Alloc  •  687 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.Alloc(Int32)
                                              19.1%   AllocateWriteHeadUnsynchronized  •  687 MB  •  Microsoft.AspNetCore.Server.Kestrel.Internal.System.IO.Pipelines.Pipe.AllocateWriteHeadUnsynchronized(Int32)
                                                19.1%   coreclr.dll  •  687 MB
                  32.3%   InitiateReceive  •  1,161 MB  •  AoA.Gaia.Network.PlayerWebsocket.InitiateReceive()
                    32.3%   ReceiveAsync  •  1,161 MB  •  System.Net.WebSockets.ManagedWebSocket.ReceiveAsync(ArraySegment, CancellationToken)
                      32.3%   ReceiveAsyncPrivate  •  1,161 MB  •  System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate(ArraySegment, CancellationToken)
                        32.3%   MoveNext  •  1,161 MB  •  System.Net.WebSockets.ManagedWebSocket+<ReceiveAsyncPrivate>d__61.MoveNext()
                          22.5%   EnsureBufferContainsAsync  •  808 MB  •  System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32, CancellationToken, Boolean)
                            22.5%   MoveNext  •  808 MB  •  System.Net.WebSockets.ManagedWebSocket+<EnsureBufferContainsAsync>d__70.MoveNext()
                              14.6%   ReadAsync  •  526 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.ReadAsync(Byte[], Int32, Int32, CancellationToken)
                                14.6%   ReadAsync  •  526 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameRequestStream.ReadAsync(Byte[], Int32, Int32, CancellationToken)
                                  14.6%   ReadAsyncInternal  •  526 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameRequestStream.ReadAsyncInternal(Byte[], Int32, Int32, CancellationToken)
                                    14.6%   MoveNext  •  526 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameRequestStream+<ReadAsyncInternal>d__21.MoveNext()
                                    ► 7.57%   coreclr.dll  •  272 MB
                                      7.07%   ReadAsync  •  254 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody.ReadAsync(ArraySegment, CancellationToken)
                                        7.07%   MoveNext  •  254 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.MessageBody+<ReadAsync>d__22.MoveNext()
                                        ► 7.07%   coreclr.dll  •  254 MB
                            ► 7.82%   coreclr.dll  •  281 MB
                        ► 9.83%   coreclr.dll  •  354 MB
                  13.7%   SetExistingTaskResult  •  493 MB  •  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(!0)
                    13.7%   MoveNext  •  493 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveAsync>d__52.MoveNext()
                      13.7%   ReceiveProcessAsync  •  493 MB  •  AoA.Gaia.Network.PlayerWebsocket.ReceiveProcessAsync()
                        13.7%   MoveNext  •  493 MB  •  AoA.Gaia.Network.PlayerWebsocket+<ReceiveProcessAsync>d__53.MoveNext()
                        ► 7.60%   CompleteReceiveTaskAsync  •  273 MB  •  AoA.Gaia.Network.PlayerWebsocket.CompleteReceiveTaskAsync(PooledBuffer)
                        ► 6.11%   coreclr.dll  •  220 MB
► 1.12%   coreclr.dll  •  40 MB
  0.06%   EnsureBufferContainsAsync  •  2.1 MB  •  System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32, CancellationToken, Boolean)
    0.06%   MoveNext  •  2.1 MB  •  System.Net.WebSockets.ManagedWebSocket+<EnsureBufferContainsAsync>d__70.MoveNext()
    ► 0.04%   ReadAsync  •  1.6 MB  •  Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.FrameDuplexStream.ReadAsync(Byte[], Int32, Int32, CancellationToken)
    ► 0.01%   coreclr.dll  •  0.5 MB

@benaadams
Copy link
Member

Its down 500MB; but that might just be what was going on at the time

Space-Async

But I thought I'd give a preview in case @stephentoub case see if the paths are doing what he expected. Will look deeper in the morning.

@benaadams
Copy link
Member

I'll look at what happens if AsyncLocal is used, as that moves everything off the default context and currently allocations skyrocket

@benaadams
Copy link
Member

If, however, the current ExecutionContext was not the default, every await would end up allocating another Action and MoveNextRunner, for 2 allocations and 56 bytes on each await. With the new design, those are eliminated, such that even if a non-default ExecutionContext is in play, and even if it changes in between awaits, the original allocations are still used.

For reference to the AsyncLocal and non-default context

Without AsyncLocal

With AsyncLocal

You can see the allocations of Action and MoveNextRunner disproportionately increase; prior to this change

@stephentoub
Copy link
Member Author

Its down 500MB; but that might just be what was going on at the time

I can't tell from the shared output: how many objects were there contributing to that? I'd guestimate you would see a 50-60 byte savings per async call that completes asynchronously and that uses the default context (and potentially much, much more savings in a non-default context).

@stephentoub
Copy link
Member Author

@r-ramesh, @gregg-miskelly, I would like to merge this. Can I go ahead and do so?

@r-ramesh
Copy link

@stephentoub Sorry for the delay, let me take a quick look at the debugger implementation. I will keep you updated.

@r-ramesh
Copy link

Looks good to me.

@stephentoub
Copy link
Member Author

Thanks, @r-ramesh.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants