-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest release breaks collector autoscaler #2018
Comments
Looking now |
(the label is incorrect, but i'm running an otel-operator pod w/ version 0.82.0) |
@jaronoff97 Thanks for having a look! So after updating from 0.81.0 to 0.82.0, one of my otel-collector gets terminated event though my hpa shows 3 current & desired pods. Status of my opentelemetrycollector CR (note it says 2/2 replicas here even though the hpa desires 3): I don't understand why the HPA says 3 replicas are running when the OpentelemetryCollector only displays 2 replicas |
This is indeed very very odd... The only thing I can imagine is happening is that the replicas: 2 is being set on the CRD which is somehow overriding what you have set for the HPA. Are there any logs from the operator? |
From a quick glance, i don't see anything that would be causing this in between the releases. But I'm going to do some more testing on my clusters to check this. |
When the operator scales down the otel-collector it logs the following:
Here are also my settings for the OpenTelemetryCollector (without the config): apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel
spec:
mode: statefulset
autoscaler:
minReplicas: 3
maxReplicas: 6
resources:
requests:
cpu: 900m
memory: 800Mi
limits:
memory: 2Gi
targetAllocator:
enabled: true
allocationStrategy: consistent-hashing
replicas: 2
resources:
requests:
cpu: 150m
memory: 200Mi
limits:
cpu: 1000m
memory: 500Mi
filterStrategy: relabel-config
prometheusCR:
enabled: true |
Hmm I've been looking into what could be causing this issue, but have not had any luck reproducing with the provided config either. I'm wondering what images you're using for your I'm also wondering how exactly you upgrade from In the past lingering resources from an old install has given me trouble when upgrading to a new version. |
@moh-osman3 Thanks for having a look! Was finally able to trace down the error and noticed that in the underlying StatefulSet an error event happens: The ports in the Pod template look like the following: What could cause the port zero being added here? I do use otlp and prometheus receivers EDIT: After investigating this, it seems like the zero port is being added only if I use a EDIT 2: Just saw this is the same as in #2016 and has been fixed already with #2017. Would be nice to have a Patch Release for this fix. So closing this, thanks for your help! |
After upgrading to the latest release 0.82.0, I have noticed that the operator scaled down my otel-collector to a replica number lower than the configured
autoscaler.minReplicas
. The underlying HPA keeps showing the minReplica count as the desired and current replica count and also shows following events:My autoscaler is configured as following and the actual number of pods is 2:
Any ideas what could cause this issue?
The text was updated successfully, but these errors were encountered: