Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve error string for too large events #258

Merged
merged 4 commits into from
Nov 22, 2024

Conversation

maplebed
Copy link
Contributor

Which problem is this PR solving?

When libhoney encounters an event that is large enough that the Honeycomb API will reject it based on published limits, it preemptively refuses to send the event, knowing it will not be accepted. This error is logged so that the application author will realize that some telemetry is getting dropped.

event exceeds max event size of 100000 bytes, API will not accept this event

However, it is exceedingly difficult to track down what telemetry it is that is getting dropped. There are no clues about where in the code this event might be generated, which field is too large, or anything else.

Short description of the changes

This PR examines the too-large event for the industry-standard fields name and service.name (standardized by Open Telemetry but also used by the beelines and other instrumentation). If those fields exist it will add their values to the error message. In the cases where the Honeycomb Beeline package is being used, this will indicate which span in a large trace is the offending span, dramatically shortening the process to find what fields might be added that are too large.

There's a delicate balance here of wanting to add enough information to help the instrumentation author while not adding too much data from the (already too large) event. It wouldn't do if the notification that an event was too large was, itself, too large! Because of this danger this PR does not add additional information such a list of all fields to the error message. Name and service name will at least point the application author to the right span, then visual code analysis and other experiments can help narrow down what might be exceeding the event's size.

@maplebed maplebed requested a review from a team as a code owner November 22, 2024 00:54
Copy link
Contributor

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @maplebed. I made a small update to move the duplicated code into a utility func and quote the event and service names if found.

I forgot to commit these :|
@MikeGoldsmith MikeGoldsmith merged commit 0171d6b into main Nov 22, 2024
5 checks passed
@MikeGoldsmith MikeGoldsmith deleted the ben.large_event_error branch November 22, 2024 14:44
@MikeGoldsmith MikeGoldsmith self-assigned this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants