-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage by using the body directly instead of copying in BasicPublishAsync #1445
base: main
Are you sure you want to change the base?
Reduce memory usage by using the body directly instead of copying in BasicPublishAsync #1445
Conversation
Hello, thanks for this contribution. Could you please provide a test application that clearly demonstrates the issue that this PR solves? |
@GerardSmit do you have any benchmarking data that would help the maintainers (and users alike) reason about the scale of the improvements? |
Would it be feasible to never copy and being the user of the library the one that copies if needed? |
Probably! In my head I have a task to check out how other .NET client libraries work in this regard. I will add a task for the 7.0 release to do that. |
Another nice to have would be support |
@paulomorgado please feel free to add comments to #1446 so they aren't lost! |
I'll see if I can create test application / benchmark tomorrow.
This would mean the background channel (in the code below) should always be blocking (so the buffer doesn't get modified by the application). In this case a rabbitmq-dotnet-client/projects/RabbitMQ.Client/client/impl/SocketFrameHandler.cs Lines 299 to 311 in 3794b4c
As stated in the PR, I'll wait with any changes until the research in #1446 is done. |
The more I think about this PR, optimizations like this feel like version 8 sorts of changes, not version 7. Thoughts? |
Probably. Unless the changes can be reduced by forcing the user to copy the message if it needs to be isolated. |
@lukebakken @michaelklishin the test application can be found here: https://github.com/GerardSmit/RabbitMQ.MemoryTest Results
Official client (copying)
Fork (copying)
Fork (non-copying)
|
Understandable. I'll keep the fork up-to-date with 7.0 in the meantime and publish it to NuGet; since without it we're getting a lot of OutOfMemory-exceptions in our application. We're running the application in a restricted container in Kubernetes, at first we thought it was cgroups not being supported in .NET. It does since .NET 9.0 (dotnet/runtime#93611) so we tried a daily build, without success. In the end we were seeing a lot of memory allocation in the ArrayPool coming from RabbitMQ, hence this PR. |
This test was disabled before I've implemented TrackRentedBytesAsync correctly. This should pass now.
This complicates the tests. After returning the rented object back to the pool, the data is resetted.
dfbd822
to
7df74eb
Compare
IMO the library should not make any copy and simply use the data that it received in My view is that That way each application can make use of this client in a way that better suits itself. |
I agree with most of this, except that it's never a good idea to not await in any way async calls. |
In the interest of shipping v7 of this library this month, I have moved this PR to version 8. Thanks! |
I'm hitting this problem, with memory at high freq grows to gigs and gigs... and slowly, if pausing the stream, begins to drop. What I don't quite understand is, the data I'm sending gets to the backend quickly, so is it a cache not being cleaned out on my C# local publisher? Why is it being held still? |
@SpiritDogstar please start a discussion here - https://github.com/rabbitmq/rabbitmq-dotnet-client/discussions If you can share code, that would be ideal. Also important - are you publishing / consuming large messages? What is their average size? |
Have you considered setting https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector#high-memory-percent? This would clear the TLS store of the pool too. That supported since net 6 dotnet/runtime#56316 Another alternative would be to expose setting the pool implementation instead of directly using arraypool shared Most libraries that apply pooling of calls against the underlying protocol apply some form of copying the input. Changing the API surface of basic publish and all the underlying mechanics that come with it for scenarios that can be solved with for example slightly assigning more runtime memory on the pod that is closer to the usage scenario or providing default more opinionated pool implementations seems quite a bit of burden in terms of complexity to carry forward over the lifetime of the project I'm not trying to shoot this down per se. I want to give a different perspective that hasn't really been discussed yet here |
To the point I wrote above there is another approach that might significantly reduce complexity. We could have an overload the allows passing https://learn.microsoft.com/en-us/dotnet/api/system.buffers.imemoryowner-1?view=net-8.0
https://learn.microsoft.com/en-us/dotnet/standard/memory-and-spans/memory-t-usage-guidelines Then the memory owner could be passed to the channel worker and eventually when the channel sent it, it will release the owner. Then the calling code can use their own pooling, the internals can bypass copying but leave the rest as is. Also related dotnet/aspnetcore#38153 |
Proposed Changes
This PR adds the ability to use the body in
BasicPublishAsync
directly when sending it to RabbitMQ, instead of copying it to a temporary byte-array.Technical information
Currently
BasicPublishAsync
rents an array from the ArrayPool, with the frame + body length:rabbitmq-dotnet-client/projects/RabbitMQ.Client/client/impl/Frame.cs
Lines 173 to 178 in 3794b4c
Then the body is being copied to this array.
rabbitmq-dotnet-client/projects/RabbitMQ.Client/client/impl/Frame.cs
Lines 183 to 188 in 3794b4c
In our application, we're sometimes forwarding large bodies to RabbitMQ. The problem is that the
ArrayPool<byte>.Shared
is using a separate buckets in each thread, causing a lot of new byte arrays to be generated.In addition, the
BasicPublishAsync
writes the rented array in a channel:rabbitmq-dotnet-client/projects/RabbitMQ.Client/client/impl/SocketFrameHandler.cs
Lines 308 to 309 in 3794b4c
Which is later written to the pipe:
rabbitmq-dotnet-client/projects/RabbitMQ.Client/client/impl/SocketFrameHandler.cs
Lines 333 to 342 in 3794b4c
If RabbitMQ is has a slow response rate or we're trying to send a lot of data at once, the channel can fill up quickly. Causing it to hold 128 rented arrays.
New API
ConnectionFactory
In the factory there is a new property called
CopyBodyToMemoryThreshold
:When this value is set, every data that is larger than the provided value (in this instance, 4096) will be used directly instead of being copied to a new array.
If the data is smaller than the provided value, the body is still copied to a new array.
When we use the data directly, we cannot send the data in the background like we're currently doing. We have to wait until the bytes are written to the pipe: otherwise, the application can modify the buffer. So, whenever we're using the buffer directly, the
BasicPublishAsync
wait until the bytes are written to the pipe.This solution is a middle ground: smaller bodies are sent in the background but allocating, larger bodies are sent directly but are blocking.
The default value of CopyBodyToMemoryThreshold is
int.MaxValue
, so there is no breaking change.IChannel
BasicPublishAsync has a new parameter:
bool? copyBody = null
:When the value is:
true
: the body is copied to a new array and sent in the background.false
: the body is written directly but blocking.null
: the length is being compared to the configCopyBodyToMemoryThreshold
in the connectionThere are two new overloads that allows
ReadOnlySequence<byte>
to be used as body:This way you can use
System.IO.Pipelines
without allocating to a temporaryReadOnlyMemory
.Types of Changes
What types of changes does your code introduce to this project?
Checklist
CONTRIBUTING.md
documentSome tests timed out.
I'll wait until this API is approved/changes have been made.
Any dependent changes have been merged and published in related repositoriesFurther Comments