Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-state-metrics publishes on 8080 by default, but datadog-agent looks on 8081 #1523

Closed
benbc opened this issue Mar 26, 2018 · 5 comments · Fixed by DataDog/integrations-core#1308

Comments

@benbc
Copy link

benbc commented Mar 26, 2018

I'm using the DataDog agent (v6.1.0) and kube-state-metrics (v1.2.0) on GKE (v1.8.9). I'm using the Kubernetes manifests provided by both projects with minimal modifications.

I see this error in my logs:

[ AGENT ] 2018-03-26 13:14:23 UTC | ERROR | (runner.go:276 in work) | Error running check kubernetes_state: [{"message": "HTTPConnectionPool(host='10.60.2.18', port=8081): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efcc08619d0>: Failed to establish a new connection: [Errno 111] Connection refused',))", "traceback": "Traceback (most recent call last):\n File \"/opt/datadog-agent/bin/agent/dist/checks/__init__.py\", line 332, in run\n self.check(copy.deepcopy(self.instances[0]))\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/kubernetes_state/kubernetes_state.py\", line 196, in check\n self.process(endpoint, send_histograms_buckets=send_buckets, instance=instance)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py\", line 350, in process\n for metric in self.scrape_metrics(endpoint):\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py\", line 314, in scrape_metrics\n response = self.poll(endpoint)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py\", line 467, in poll\n response = requests.get(endpoint, headers=headers, stream=True, timeout=1, cert=cert, verify=verify)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/api.py\", line 72, in get\n return request('get', url, params=params, **kwargs)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/api.py\", line 58, in request\n return session.request(method=method, url=url, **kwargs)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/sessions.py\", line 508, in request\n resp = self.send(prep, **send_kwargs)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/sessions.py\", line 618, in send\n r = adapter.send(request, **kwargs)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/adapters.py\", line 508, in send\n raise ConnectionError(e, request=request)\nConnectionError: HTTPConnectionPool(host='10.60.2.18', port=8081): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efcc08619d0>: Failed to establish a new connection: [Errno 111] Connection refused',))\n"}]

It looks like the check is trying to scrape the /metrics URL from port 8081. However kube-state-metrics exposes that URL on port 8080 by default. (The IP address of the container is correct.)

I've spent a while reading your docs and spelunking in a couple of your codebases and I can't work out how that default gets set or how to override it.

So I suppose I have a couple of questions:

  1. How do I change the port that the check uses?
  2. Is there some reason for it not to use the default port by default?

Thanks
-Ben

@benbc
Copy link
Author

benbc commented Mar 26, 2018

Sorry for spamming you today. :-)

mfpierre added a commit to DataDog/integrations-core that referenced this issue Mar 27, 2018
mfpierre added a commit to DataDog/integrations-core that referenced this issue Mar 27, 2018
@mfpierre
Copy link
Contributor

mfpierre commented Mar 27, 2018

Hi @benbc thanks for the report, it seems that the KSM project recently added the 8081 port in the exposed port list and this messing up with the autodiscovery template, because in case of multiple exposed port, we'll take the highest port number.
I've made a PR to fix the official AD template DataDog/integrations-core#1308 in the meanwhile you can either edit the AD template or try using annotations on the KSM pod

@benbc
Copy link
Author

benbc commented Mar 27, 2018

@mfpierre Thank you.

I'm sure I can use the annotation approach. The diff for your fix will help with that. I was unsure, reading your docs, which pod the annotation should go on (datadog-agent or kube-state-metrics), but your comment above makes that clear, so I now I have everything I need.

What is the easiest way for me to trace your linked fix in https://github.com/DataDog/integrations-core to a published version of the datadog/datadog-agent Docker image?

mfpierre added a commit to DataDog/integrations-core that referenced this issue Mar 27, 2018
mfpierre added a commit to DataDog/integrations-core that referenced this issue Mar 27, 2018
@mfpierre
Copy link
Contributor

@benbc just merged the PR should go out with the next agent release

gmmeyer pushed a commit to DataDog/integrations-core that referenced this issue Mar 27, 2018
@pdecat
Copy link
Contributor

pdecat commented Apr 4, 2018

Here is the proper way to fix this issue with annotations for Datadog agent version 5:

diff --git a/kube-state-metrics/kube-state-metrics-deployment.yaml b/kube-state-metrics/kube-state-metrics-deployment.yaml
index 2e8dc47..92417e4 100644
--- a/kube-state-metrics/kube-state-metrics-deployment.yaml
+++ b/kube-state-metrics/kube-state-metrics-deployment.yaml
@@ -14,6 +14,10 @@ spec:                           
     metadata:                                     
       labels:                                     
         k8s-app: kube-state-metrics               
+      annotations:                                
+        service-discovery.datadoghq.com/kube-state-metrics.check_names: '["kubernetes_state"]'        
+        service-discovery.datadoghq.com/kube-state-metrics.init_configs: '[{}]'                       
+        service-discovery.datadoghq.com/kube-state-metrics.instances: '[{"kube_state_url": "http://%%host%%:%%port_0%%/metrics"}]'                                                                            
     spec:                                         
       serviceAccountName: kube-state-metrics      
       containers:                                 

Datadog agent version 6 users should probably only need to replace service-discovery.datadoghq.com by ad.datadoghq.com.

Edit: note the use of %%port_0%% instead of a hard coded value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants