Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] API server metrics broken after upgrade #5274

Open
krankkkk opened this issue Feb 5, 2025 · 4 comments
Open

[kube-prometheus-stack] API server metrics broken after upgrade #5274

krankkkk opened this issue Feb 5, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@krankkkk
Copy link

krankkkk commented Feb 5, 2025

Describe the bug a clear and concise description of what the bug is.

We upgraded prometheus-kube-stack from 62.6.0 to 68.4.5 and observed the uptime metrics (apiserver_request:availability30d) of the api-server to be completely implausible.

What's your helm version?

Deployed via ArgoCD, which internally uses 3.15.4

What's your kubectl version?

Irrelevant

Which chart?

prometheus-kube-stack

What's the chart version?

68.4.5

What happened?

We observed the metrics of apiserver_request:availability30d to go from 99.999 % to way beyond 100% i.e. apiserver_request:availability30d{verb="all"} is currently at 1.6425321904704488 on one cluster and 2.225346243637766 on another.

If we take a look via Prometheus UI we can spot the exact time we initiated the update.

Image

If we rollback the update, we can see the metrics going back to normal.

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

No command necessary

Anything else we need to know?

ClusterVersion is v1.29.2

@krankkkk krankkkk added the bug Something isn't working label Feb 5, 2025
@OOub
Copy link

OOub commented Feb 7, 2025

same issue here

@KihyeokK
Copy link

KihyeokK commented Feb 7, 2025

Getting the same issue on version 68.1.1. API Server dashboard seems off with values going way above 100% after upgrading to version 68.1.1.

Also getting KubeAPIErrorBudgetBurn alerts after upgrade just like in this issue, seems like burnrate queries like apiserver_request:burnrate1d may have problems too.

@abohatyrenko
Copy link

Same here with the latest version of kube-prometheus-stack(69.2.0)

@zeritti zeritti changed the title [prometheus-kube-stack] API server metrics broken after upgrade [kube-prometheus-stack] API server metrics broken after upgrade Feb 9, 2025
@boettluSICKAG
Copy link

Getting the same issue on the latest chart version. The recording rule apiserver_request:availability30d{verb="read"} is not working. Seemingly because the sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{le=~"30(\\.0)?",scope="cluster",verb=~"LIST|GET"}) metric does not exist. Looks like the le=30 bucket is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants