-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove superfluous encoder.Reset() #262
Conversation
f2db7b8
to
ab8538f
Compare
When using the Apache Pulsar go client (https://github.com/apache/pulsar-client-go) configured to use zstd compression during message production in high-throughput service, we see high-memory allocation due to Encoder.Reset(). Encoders on the `e.encoders` channel are already reset; there is no need to reset an encoder after pulling it from the channel. Signed-off-by: Daniel Ferstay <dferstay@splunk.com>
ab8538f
to
37efcfd
Compare
Very nice! Thanks for the contribution. I plan to make a release soon. Probably when #260 is done which is making slow but steady progress. |
@dferstay Out of curiosity - where did you observe the allocations (code line or type of alloc)? It shouldn't really do any allocations once it has 'warmed up', so I am a bit curious what parts are doing the allocations. |
I will check the failures tomorrow, heading to bed now. It should work as you propose the change AFAICT. There might be a problem where it relies on the double Reset when re-using. |
It appears to pick of phantom matches across Resets when only one is done. |
We see allocations coming from the In our testing, the pulsar topic that we are writing to has eight partitions; this means there are:
All 8 BatchBuilder intances will use the same Zstd Encoder instance. The code that calls the encoder is here: |
We are currently using the default encoder options, which means we are using a doubleFastEncoder with:
|
Increment buffer offset when encoding without history. Since it does not store history, we must offset `e.cur` to avoid false matches for next user. Fixes the need for double resets to clear history. Prepares for #262
Correct, you did the math yourself. To support the default number of concurrent encodes it needs a history for each. I am guessing you are testing on a CPU with You can tweak the number of encoders available with Window size can be adjusted down without any penalty if your typical payload is smaller. I will look into the complexity of reducing the upfront allocations where we know the payload size when using EncodeAll. When it is less than a single block size (128K) we don't need to track the history at all. |
I have updated #263 to only allocate history when it is needed, so single blocks < 128K will not have a history allocated (but will keep it if it already has been). I don't want to risk having to re-allocate history, so we either allocate what we want or nothing at all if we don't need it. I will do some fuzz testing before merging. This will give double Resets in some cases, but that itself is pretty harmless. To control maximum memory usage you can adjust the parameters described above. |
Increment buffer offset when encoding without history. Since it does not store history, we must offset `e.cur` to avoid false matches for next user. Fixes the need for double resets to clear history. Only allocate history when we need to encode more than 1 block. Replaces #262
Correct.
Excellent!
Thank you very much for the explanation; much appreciated. |
When using the Apache Pulsar go client
(https://github.com/apache/pulsar-client-go) configured to use zstd
compression during message production in high-throughput service, we
see high-memory allocation due to Encoder.Reset().
Encoders on the
e.encoders
channel are already reset;there is no need to reset an encoder after pulling it from the channel.
Signed-off-by: Daniel Ferstay dferstay@splunk.com