-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global sampling service telemetry logging configuration options #4554
Comments
I think this is a reasonable request but I would like to understand better why we are logging too many messages per second.
This a bit unexpected to me. We use exponential backoff in senders so presumably after a few failures to send the rate of failures should slow down a lot. Is this not happening? What message is logged thousands times a second? Also queued_retry component itself uses a sampled logger, precisely to avoid logging too much. Is batch processor the guilty component? |
On a quick pass I'm not seeing where the batch processor uses a sampling config, and right it is flooding the logs when extended export failures occur:
There may be other issues with the exporter helpers as well and I will open another issue if I determine what it is. |
AFAIK, it doesn't. queued_retry does. I think it is a good idea to have a general sampling for all logs (at least configurable). Ideally I would prefer that components are given 2 loggers: one to use during startup which will not be sampled (or will have higher sampling thresholds) to make sure all critical messages during startup are printed, and another logger to be used after startup to make sure long-running processes don't generate huge volumes of logs. |
Fixes open-telemetry#4554 Signed-off-by: Bogdan <bogdandrutu@gmail.com>
Fixes open-telemetry#4554 Signed-off-by: Bogdan <bogdandrutu@gmail.com>
Fixes #4554 Signed-off-by: Bogdan <bogdandrutu@gmail.com> Signed-off-by: Bogdan <bogdandrutu@gmail.com>
Is your feature request related to a problem? Please describe.
Some components use warn and info level statements for paths that can be hit frequently, like the batch processor when configured with a small sending queue:
opentelemetry-collector/processor/batchprocessor/batch_processor.go
Line 185 in 9d3a8a4
Describe the solution you'd like
The
ServiceTelemetryLogs
config should provide zap samplinginitial
andthereafter
fields that are used when instantiating all component loggers, with defaults equivalent to disabling:Describe alternatives you've considered
Given the nature of zap logger usage in components I'm not sure of an alternative sampling approach within the collector process, or another course other than setting an arbitrarily high log level and missing important statements, or making valid warning/info scenarios have a lower log level (to same effect or no change with low enough configured log level).
Additional context
Would potentially resolve #1061
The text was updated successfully, but these errors were encountered: