-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix aggregator size estimation #155
Conversation
@zackwine FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks alright to me, and it sounds like you have thoroughly tested it in your production to verify it fixed the issue and didn't add any new regressions?
Yeah, to be really clear on what I did: I deployed the new plugin to a machine that was stuck on a record like this with debug logging enabled, and saw it get past the bad records,and still compute correct sizes for everything else. I watched it for ~5m before opening the PR and didn't see any size mismatches. |
Cool, we have already deployed the fix to our production environments, so that will just let this bake for a bit 🙂 |
@jamiees2 I think it would be nice to have some positive and negative unit tests. |
Signed-off-by: James Elias Sigurdarson <jamiees2@gmail.com>
…-for-fluent-bit into mainline
Added tests, and realized protowire which is included in |
Description of changes:
The aggregator tries to keep track of the size of the serialised message, so that it can know if it should flush before adding a new record, and not overflow Kinesis' 1MB limit. The current implementation keeps track of this by adding seemingly random constants, which break when e.g an integer is passed that does not fit in a 1-byte varint.
This caused issues for us in production, where fluent bit would get stuck on a record that Kinesis refused to accept because the plugin would keep trying to submit a record larger than 1MB that it thought was going to be smaller than 1MB.
This changes the code to map to protobuf's actual size calculation, keeping track of the sizes of varints, buffer lengths, etc. This change fixed the bug in our production environment, and caused the plugin to correctly keep the record data below 1MB.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.