-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wavefront output should distinguish between retryable and non-retryable errors #8404
Wavefront output should distinguish between retryable and non-retryable errors #8404
Conversation
Hey @fishy . it's up to each output to decide what's retry-able and not. If an output gets an error trying to post a metric and it doesn't want it to be retried, it should log the error and continue, not returning an error from the |
@ssoroka Thanks for the feedback, Steven! That sounds reasonable, but the problems are that if we go that route we need to make sure that the output plugin has the actual working logger configured, or the attempt to log and swallow the error will go into a blackhole, or even cause panic. Looking at the plugin code, it does have the logger in the struct, but I'm not entirely sure whether or not it's configured (for example, it's not initialized during the init function: telegraf/plugins/outputs/wavefront/wavefront.go Lines 348 to 355 in ca04106
Log field in wavefront plugin?
|
if you're looking at |
07913a6
to
ace138e
Compare
Thanks, @ssoroka . I updated the PR to only check retryable inside wavefront plugin. Please take another look. (commit message and PR description also updated accordingly) |
return fmt.Errorf("Wavefront sending error: %v", err) | ||
} | ||
w.Log.Errorf("non-retryable error during Wavefront.Write: %v", err) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might not want to return here, and instead finish writing the batches, as some of those metrics are probably still good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
…le errors Currently we assume all wavefront send errors are retryable, and when an error happens during Write function we will reject the buffer and keep retrying the next tick. This means that when an actually non-retryable error happens, we'll just keep getting the same error on every tick, and never flush the buffer. One such error we encountered is "empty metric name" error. Add isRetryable function to detect non-retryable errors, and make it default to assume that all errors are retryable (so it matches the current behavior), but make it possible to mark certain errors as non-retryable. Currently only handled that "empty metric name" error as non-retryable. A support ticket has been filed against wavefront to provide a canonical way to distinguish between retryable and non-retryable errors. Signed-off-by: Yuxuan 'fishy' Wang <yuxuan.wang@reddit.com>
ace138e
to
12a7d53
Compare
Merged! thank you |
@ssoroka Thanks! I assume this will be included in 1.16.3 release? Do you know when will that release happen? |
Yep! I think we might have another one in a week or two. Until then it'll be in the nightly release. |
Currently we assume all wavefront send errors are retryable, and when an
error happens during Write function we will reject the buffer and keep
retrying the next tick. This means that when an actually non-retryable
error happens, we'll just keep getting the same error on every tick, and
never flush the buffer.
One such error we encountered is "empty metric name" error.
Add isRetryable function to detect non-retryable errors, and make it
default to assume that all errors are retryable (so it matches the
current behavior), but make it possible to mark certain errors as
non-retryable.
Currently only handled that "empty metric name" error as non-retryable.
A support ticket has been filed against wavefront to provide a canonical
way to distinguish between retryable and non-retryable errors.
Signed-off-by: Yuxuan 'fishy' Wang yuxuan.wang@reddit.com
Required for all PRs: