[receiver/k8scluster] Detecting API deprecation in k8s cluster receiver #27907

omrozowicz-splunk · 2023-10-23T10:31:25Z

Component(s)

receiver/k8scluster

Is your feature request related to a problem? Please describe.

The problem is that some of the k8s clusters (GKE) can be auto-upgraded only if none of the soon-to-be-removed API is called. As stated in GKE documentation:

If GKE detects usage of a deprecated feature or API, GKE pauses automatic upgrades to prevent your cluster from being upgraded into a broken state. Upgrades to the next Kubernetes minor version are paused, but GKE continues to deliver patch upgrades to the cluster on the current minor version

GKE can resume automatic upgrades only if there were no calls to deprecated endpoints for the last 30 days or until current’s version end-of-life.

We had such issues on the project that uses the contrib repo:
signalfx/splunk-otel-collector-chart#918
signalfx/splunk-otel-collector-chart#897

So if GKE users have an OTel collector on their k8s cluster and use a k8s cluster receiver, it might block them from the automatic upgrade feature. We had this problem when v2beta2/horizontalpodautoscaler and v1beta1/cronjob were enabled in the k8s cluster receiver (it was disabled with this PR https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/26516/files).

One way to fix this problem, which we already used for beta APIs from the above, is to simply delete removed kinds. That approach has one drawback - it will stop polling any metrics for older clusters. For example now users of k8s <1.21 cannot gather any hpa’s and cronjobs metrics.

Describe the solution you'd like

Right now we don't have any betas in versions we support, but in case this situation happens again we could provide one of those solutions:

Always use the newest API group: We can provide a few groups - usually, that would be two, one beta and a future one - but poll metrics only from the latest one. Only in case the latest one is not yet supported on the current cluster, we use the older one.
Provide functionality to disable certain API groups: Another option that would probably be easier to implement is providing a way to configure API groups that we want to exclude from polling. That might be something like:

k8scluster:
  excludeVersionKind:
    - cronjob/v1beta1

The drawback would be that the user has to know what to exclude, so in the case of GKE, they’d need to wait for an additional 30 days after disabling the soon-to-be-removed version.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2023-10-23T10:31:48Z

Pinging code owners:

receiver/k8scluster: @dmitryax @TylerHelmuth

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jvoravong · 2023-10-25T13:54:37Z

The root issue here could potentially impact the auto-upgrade feature for any major Kubernetes distribution, not just GKE.

crobert-1 · 2023-11-13T22:40:31Z

Sounds like a valid enhancement to me. I'll have to defer to code owners to decide which option is the best going forward as I don't have much experience here.

github-actions · 2024-01-15T03:29:45Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/k8scluster: @dmitryax @TylerHelmuth @povilasv

See Adding Labels via Comments if you do not have permissions to add labels yourself.

povilasv · 2024-01-17T13:54:07Z

Hey, this makes sense. I like the second approach:

Always use the newest API group: We can provide a few groups - usually, that would be two, one beta and a future one - but poll metrics only from the latest one. Only in case the latest one is not yet supported on the current cluster, we use the older one.

But given that we don't have beta APIs anymore, let's not add this now. As it would be hard to test / write code for a proble m we don't have right now.

github-actions · 2024-03-18T03:30:10Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/k8scluster: @dmitryax @TylerHelmuth @povilasv

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-05-17T05:19:33Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

omrozowicz-splunk added enhancement New feature or request needs triage New item requiring triage labels Oct 23, 2023

github-actions bot added the receiver/k8scluster label Oct 23, 2023

github-actions bot mentioned this issue Oct 24, 2023

Weekly Report: 2023-10-17 - 2023-10-24 #28557

Closed

github-actions bot mentioned this issue Oct 31, 2023

Weekly Report: 2023-10-24 - 2023-10-31 #28813

Closed

github-actions bot mentioned this issue Nov 7, 2023

Weekly Report: 2023-10-31 - 2023-11-07 #29000

Closed

crobert-1 removed the needs triage New item requiring triage label Nov 13, 2023

github-actions bot added the Stale label Jan 15, 2024

crobert-1 removed the Stale label Jan 16, 2024

github-actions bot added the Stale label Mar 18, 2024

github-actions bot added the closed as inactive label May 17, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/k8scluster] Detecting API deprecation in k8s cluster receiver #27907

[receiver/k8scluster] Detecting API deprecation in k8s cluster receiver #27907

omrozowicz-splunk commented Oct 23, 2023

github-actions bot commented Oct 23, 2023

jvoravong commented Oct 25, 2023

crobert-1 commented Nov 13, 2023

github-actions bot commented Jan 15, 2024

povilasv commented Jan 17, 2024

github-actions bot commented Mar 18, 2024

github-actions bot commented May 17, 2024