-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8762
Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8762
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤝 ✅ CLA has been signed. Thank you!
This pull request introduces 1 alert when merging 49336fa into d415d9f - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging 79e0e71 into d415d9f - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging cdc439c into d415d9f - view on LGTM.com new alerts:
|
@ssoroka - can you please look for this to be merged ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. I'd like to see more tests around the cAdvisor code if possible, as it looks as if it will panic when run.
plugins/inputs/prometheus/README.md
Outdated
@@ -33,6 +33,10 @@ in Prometheus format. | |||
## - prometheus.io/path: If the metrics path is not /metrics, define it with this annotation. | |||
## - prometheus.io/port: If port is not 9102 use this annotation | |||
# monitor_kubernetes_pods = true | |||
## Get the list of pods to scrape from either: | |||
## - version 1 (default): the kubernetes watch api (cluster-wide) | |||
## - version 2: the local cadvisor api (node-wide); for scalability. Note that the environment variable NODE_IP must be set to the host IP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about accepting node_ip as a string parameter? I find environment variables aren't great for discoverability. I'm not against it also setting the value from env NODE_IP by default in init()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this will be solving scalability, there will be a large number of nodes. For this new way, telegraf will need to be run on every node, so having to put a different node ip value for every telegraf config could be impractical.
Also, if the environment variable does not exist, then we would need as many kube api calls as there are during startup which could overload the kube api server at scale. I also have instructions in the readme for how to get the environment variable.
Please let me know what you think.
Hi @ssoroka, I apologize for the delay. I have addressed and/or responded to your comments. Could please take another look? Thank you! |
p.nodeIP = os.Getenv("NODE_IP") | ||
if p.nodeIP == "" { | ||
return errors.New("The environment variable NODE_IP is not set. Cannot get pod list for monitor_kubernetes_pods using node scrape scope") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider defaulting this to os.Getenv("NODE_IP") and allowing it to be set from config. ... actually, I think it's a requirement that all settings need to be configurable from config settings, though env variable defaults are fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this as a config setting. If the config for this is empty or the specified ip is invalid, it then checks if the env var exists and is valid
Thanks for all the work here, much appreciated! |
Thanks @ssoroka, @vishiy and @gracewehner! |
… (500+ pods) when scraping thru 'monitor_kubernetes_pods' (influxdata#8762) (cherry picked from commit d7df2c5)
Required for all PRs:
PR for the issue #8705.
If monitor_kubernetes_pods = true and the new monitor_kubernetes_pods_version = 2 is specified, the pod list is queried locally on the node every 60s instead of using the watch api. This allows for scalability for large clusters so the pods to scrape are distributed among the nodes, which requires telegraf to be run as a daemonset. The same annotations to choose which pods to scrape are used and the optional filtering by namespace and selectors is also the same, with this filtering done afterwards. The feature is backwards compatible and is used if and only if monitor_kubernetes_pods_version = 2 is specified in the config.