-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry Collector does not gracefully shutdown, losing metrics on spot instance termination #33441
Comments
Pinging code owners for exporter/datadog: @mx-psi @dineshg13 @liustanley @songy23 @mackjmr @ankitpatel96. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
1 similar comment
Pinging code owners for exporter/datadog: @mx-psi @dineshg13 @liustanley @songy23 @mackjmr @ankitpatel96. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hi, I'm seeing the same issue. And updating to v0.102 doesn't help, we are still losing metrics |
@songy23 sorry for taking so long but unfortunately upgrading didn't help |
@Rommmmm Does the collector not gracefully shut down at all, or is it being killed before it can shut down gracefully? The mention of |
Its not gracefully shut down |
@Rommmmm is it being killed or terminated? Processes being killed is not a graceful shutdown scenario AFAIK. What I'm guessing is happening is that your |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
datadogexporter
What happened?
Description
We are currently experiencing an issue with the OpenTelemetry Collector running in our Kubernetes cluster, which is managed by Karpenter. Our setup involves spot instances, and we've noticed that when Karpenter terminates these instances, the OpenTelemetry Collector does not seem to shut down gracefully. Consequently, we are losing metrics and traces that are presumably still in the process of being processed or exported.
Steps to Reproduce
Expected Result
The OpenTelemetry Collector should flush all pending metrics and traces before shutting down to ensure no data is lost during spot instance termination.
Actual Result
During a spot termination event triggered by Karpenter, the OpenTelemetry Collector shuts down without flushing all the data, causing loss of metrics and traces.
Collector version
0.95.0
Environment information
Environment
Kubernetes Version: 1.27
Karpenter Version: 0.35.2
Cloud Provider: AWS
OpenTelemetry Collector configuration
Log output
No response
Additional context
I noticed that there is a terminationGracePeriodSeconds configuration in Kubernetes deployment that can give workloads more time to shutdown. However, this option does not seem to be exposed in the OpenTelemetry Collector Helm chart.
I would like to suggest the following enhancements:
The text was updated successfully, but these errors were encountered: