Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes api server metrics: reload bearer token to avoid failing on stale token #10604

Closed
grosser opened this issue Jan 19, 2022 · 7 comments · Fixed by #11686 or DataDog/integrations-core#11915

Comments

@grosser
Copy link

grosser commented Jan 19, 2022

Describe what happened:
our logs show that dd agent uses an outdated service account token

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata", ... "stage":"ResponseComplete","requestURI":"/metrics","verb":"get","user" ... "authentication.k8s.io/stale-token":"seconds after warning threshold: 3337", ...

Describe what you expected:
the agent should reload it's bearer token <every 10min or when it expires

Steps to reproduce the issue:
Deploy agent into a cluster that has BoundServiceAccountTokenVolume=true (new default in 1.22+) and make it scrape the kubernetes api server

Additional environment details (Operating System, Cloud provider, etc):
Agent/7.32.3

@Jell
Copy link

Jell commented Jan 26, 2022

Seeing this issue on our side as well when running the agent on EKS 1.21

@grosser
Copy link
Author

grosser commented Feb 18, 2022

found another one:

file":"pkg/autodiscovery/config_poller.go","line":"128","func":"collect","msg":"Unable to collect configurations from provider kubernetes: couldn't fetch \"podlist\": unexpected status code 401

seems like that's in the go code somewhere too

@michael-barker
Copy link

I also had this issue in EKS 1.21. Restarting the daemon set seems to have resolved it for now.

@grosser
Copy link
Author

grosser commented Mar 23, 2022

restarting should only resolve it for 1h
also strange that it would happen on 1.21 since the change is in 1.22 ... but maybe they opted in early

@michael-barker
Copy link

@grosser It's been 2 hours and the issue hasn't come back. The issue I had was what you mentioned in your follow up comment. I don't think I had the same thing as your original post about the bearer token.

@envybee
Copy link

envybee commented Mar 24, 2022

We're seeing this in EKS 1.21 as well -- I believe the BoundServiceAccountTokenVolume feature went into beta in 1.21 and has it enabled by default in 1.21: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

@sashasimkin
Copy link

sashasimkin commented Apr 5, 2022

Same here, seeing this on EKS 1.21, this also seems to cause log collection failure for us.

Is there a workaround for the issue, except for restarting the daemonset each hour?

UPD: I'm seeing the same behavior in EKS 1.21 - the token TTL is 1yr, so it's not that critical until transition period ends kubernetes/kubernetes#105654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment