You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As usual security is hard and it gets tangled in many different things.
After the update, everything seems to work fine but then after a Prometheus alert came by:
Kubelet has disappeared from Prometheus target discovery.
Annotations:
- message: Kubelet has disappeared from Prometheus target discovery.
- runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown
Details:
- alertname = KubeletDown
- cluster = dev
- prometheus = monitoring/prometheus-operator-prometheus
- severity = critical
Looking into this in prometheus:
The prometheus service monitor is getting a 401 now anauthorized. The update broke some monitoring thing.
We should be able to leverage Prometheus monitoring to track if the updates broke anything or the stats like cpu/memory usage changes significantly after the update.
I suppose the testing will have to monitor this cluster for a bit (1 hour at the least?)? An then tell use this is something new.
The text was updated successfully, but these errors were encountered:
We recently did an update where we made some change to the kubelet to make it more "secure".
Kops changes:
As usual security is hard and it gets tangled in many different things.
After the update, everything seems to work fine but then after a Prometheus alert came by:
Looking into this in prometheus:
![Screenshot from 2019-11-19 11-43-22](https://user-images.githubusercontent.com/575972/69180266-d95df200-0ac1-11ea-8cd5-e61b5d82270c.png)
The prometheus service monitor is getting a 401 now anauthorized. The update broke some monitoring thing.
We should be able to leverage Prometheus monitoring to track if the updates broke anything or the stats like cpu/memory usage changes significantly after the update.
I suppose the testing will have to monitor this cluster for a bit (1 hour at the least?)? An then tell use this is something new.
The text was updated successfully, but these errors were encountered: