feat: improve error string for too large events #258
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which problem is this PR solving?
When libhoney encounters an event that is large enough that the Honeycomb API will reject it based on published limits, it preemptively refuses to send the event, knowing it will not be accepted. This error is logged so that the application author will realize that some telemetry is getting dropped.
event exceeds max event size of 100000 bytes, API will not accept this event
However, it is exceedingly difficult to track down what telemetry it is that is getting dropped. There are no clues about where in the code this event might be generated, which field is too large, or anything else.
Short description of the changes
This PR examines the too-large event for the industry-standard fields
name
andservice.name
(standardized by Open Telemetry but also used by the beelines and other instrumentation). If those fields exist it will add their values to the error message. In the cases where the Honeycomb Beeline package is being used, this will indicate which span in a large trace is the offending span, dramatically shortening the process to find what fields might be added that are too large.There's a delicate balance here of wanting to add enough information to help the instrumentation author while not adding too much data from the (already too large) event. It wouldn't do if the notification that an event was too large was, itself, too large! Because of this danger this PR does not add additional information such a list of all fields to the error message. Name and service name will at least point the application author to the right span, then visual code analysis and other experiments can help narrow down what might be exceeding the event's size.