kubernetes input plugin #569

jchauncey · 2016-01-22T16:52:59Z

It would be nice to see telegraf support kuberentes deployments. This means decorating metrics with appropriate pod labels (pod name, namespace, container name, etc..).

I'll try and take a stab at this soon as I would like to use telegraf instead of cadvisor for collecting metrics on my k8s cluster.

sparrc · 2016-01-22T17:25:39Z

@jchauncey How are you monitoring the kubernetes deployment? I've recently overhauled the docker plugin to gather more metrics and allow specifying an endpoint, does this help? https://github.com/influxdata/telegraf/tree/master/plugins/inputs/docker

I'm not sure how kubernetes labels work, but the docker labels are applied as tags

jchauncey · 2016-01-22T17:33:25Z

youll need to contact the kube api server to gather the information about each pod and the metrics its consuming.

This would be so I could monitor the k8s deployment. My goal is to supplant cadvisor with telegraf to have a tighter integration with influx.

jchauncey · 2016-01-25T05:14:02Z

My thought right now is that when you collect the metrics on the container if they have turned on k8s metrics we reach out to the api server and fetch all the pod information. That would probably take place around here

rvrignaud · 2016-01-25T06:30:09Z

Hi @jchauncey,

This is slightly out of scope of this issue but for my k8s cluster in production I use both heapster running with InfluxDB sink and telegraf using prometheus input plugin to fetch kubelet metrics. This works pretty well. I'm happy to provide more informations if you want.

jchauncey · 2016-01-25T15:56:45Z

So right now Id like to not rely on heapster and just use the TICK stack as a pure installation on top of k8s. I feel this would ultimately be more powerful than how the cadvisor+heapster stack is being managed. Especially since TICK will move faster than both of those (cadvisor + heapster)

jchauncey · 2016-01-26T20:43:02Z

@sparrc is there a way to decorate data that has been generated in another plugin?

I'm trying to decide if polluting the docker plugin with k8s is the right way to go or not.

sparrc · 2016-01-26T20:47:46Z

I think making a separate kubernetes plugin would be better.

It's okay if there is some duplicated docker code. Or you could also break out the docker metric collection into a separate package and put it in internal/.

jchauncey · 2016-01-27T21:07:11Z

@sparrc is there a reason you used cont_ as the prefix and not container_?

sparrc · 2016-01-27T21:16:51Z

@jchauncey not particularly.......I'm not opposed to changing it to container_

jchauncey · 2016-01-27T21:34:32Z

k ill probably make that change as part of my refactor to support the k8s plugin.

fixes influxdata#569 This commit also refactors the existing docker plugin so that the kubernetes plugin can gather metrics from the docker system.

jchauncey · 2016-01-28T21:04:07Z

@sparrc can you do token replacements in the toml file? For example if I do:

urls = ["http://$INFLUXDB_HOST:$INFLUXDB_PORT"]

will it do the right thing?

sparrc · 2016-01-28T21:16:14Z

I don't think that's part of the toml spec, so probably not

fixes influxdata#569 This commit also refactors the existing docker plugin so that the kubernetes plugin can gather metrics from the docker system.

jchauncey · 2016-02-05T20:26:15Z

@sparrc so ive been trying to strip out the kubernetes api from the main repo and reduce the number of dependencies. however, it doesnt seem to be helping. No matter what I try I cant get the binary below 54 megs.

outside of writing a brand new client im not sure what i can do to help

jchauncey · 2016-02-08T17:37:42Z

hah it closed this issue because I merged my PR into the deis org. Anyways for now I am going to maintain my own fork of telegraf with the k8s plugin since I cannot seem to reduce the binary size. If you decide that you are ok with the binary size I'll resubmit the PR.

jchauncey · 2016-02-16T18:20:30Z

So I found this go project called goupx which reduced the binary size of telegraf (with the kubernetes deps) to 12M.

╰─± goupx telegraf
2016/02/16 11:18:37 {Class:ELFCLASS64 Data:ELFDATA2LSB Version:EV_CURRENT OSABI:ELFOSABI_NONE ABIVersion:0 ByteOrder:LittleEndian Type:ET_EXEC Machine:EM_X86_64 Entry:4588112}
2016/02/16 11:18:37 Hemming PT_LOAD section
2016/02/16 11:18:37 File fixed!
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2013
UPX 3.91        Markus Oberhumer, Laszlo Molnar & John Reiser   Sep 30th 2013

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
  55794160 ->  12086424   21.66%  linux/ElfAMD   telegraf

╰─± ls -alh | grep telegraf
-rwxr-xr-x   1 jonathanchauncey  staff    12M Feb 16 11:18 telegraf

emmanuel · 2016-02-20T15:55:08Z

@rvrignaud I would love to hear about how you're using Heapster (with InfluxDB sink) together with Telegraf's Prometheus plugin for kubelet metrics. What does the basic data-flow and setup look like?

Thanks for any insight!

emmanuel · 2016-02-20T17:32:01Z

Also, the Docker containers started by the kubelet contain Docker labels for several pieces of kubernetes metadata. See kubernetes/kubernetes#17234 for some details. It looks like k8s container name, pod name, namespace and some other (static) metadata is attached as labels on container startup. If it covers the desired tag values, this may be easier (and more efficient) that going out to the k8s apiserver for pod details.

sparrc · 2016-02-20T18:38:47Z

There is also a contributor who is making improvements to the prometheus plugin to allow it to collect kubernetes metrics: #707

jchauncey · 2016-02-20T20:12:56Z

I don't think k8s current applies pod labels to the docker label structure. There are a few items in the docker labels but I don't think pod namespace nor pod name are in that list. It definitely doesn't contain pod labels set via the manifest. That could change in future releases.

feelobot · 2016-02-24T00:10:14Z

👍 definitely waiting for this to get merged in

rvrignaud · 2016-02-24T13:20:05Z

Hi @emmanuel,

Not sure to understand what you are looking for, but here is what I have:
I'm currently running a single influxdb node (0.10.x) outside of kubernetes.

Heapster running with influxdb sink (I actually have 2 heapster, one managed for me by GKE that have the Google Cloud Monitoring backend and this one). The configuration is pretty straight forward:

      containers:
      - name: heapster
        image: gcr.io/google_containers/heapster:v0.19.1
        resources:
          limits:
            memory: 550Mi
        command:
          - /heapster
          - --source=kubernetes:''
          - --sink=influxdb:http://influxdb:8086
          - --sink_frequency=30s

telegraf running with prometheus inputs in a pod with a script using the downward API to get all k8s nodes

[[inputs.prometheus]]
urls = ["http://host1:10255/metrics", "http://host2:10255/metrics", "..."]

telegraf is also running a home made python script that uses donward API to compute cluster wide metrics

titilambert · 2016-02-27T22:22:41Z

@emmanuel @jchauncey I'm agree with @rvrignaud ! I think the best think to do is to use Heapster to get container metrics and use Prometheus input plugin to get k8s (infrastructure) metrics from kube-apiserver/kube-scheduler/kube-controller-manager/kubelet.
This PR is a rewrite of heapster... I also tried to make a plugin specific for kubernetes (using "/metrics" pages of each process) (#691) but it was better to just improve Prometheus plugin and add metric pass/drop feature to Telegraf.
BTW, cAdvisor is ALREADY embedded in kubelet, (you access to the webpage) so it's just easier to use it...

jchauncey · 2016-02-27T23:51:21Z

So I agree completely and have already got work planned starting this week
to figure that out and get it documented.
On Feb 27, 2016 3:22 PM, "Thibault Cohen" notifications@github.com wrote:

@emmanuel https://github.com/emmanuel @jchauncey
https://github.com/jchauncey I'm agree with @rvrignaud
https://github.com/rvrignaud ! I think the best think to do is to use
Heapster to get container metrics and use Prometheus input plugin to get
k8s (infrastructure) metrics from
kube-apiserver/kube-scheduler/kube-controller-manager/kubelet.
This PR is a rewrite of heapster... I also tried to make a plugin specific
for kubernetes (using "/metrics" pages of each process) (#691
#691) but it was better to
just improve Prometheus plugin and add metric pass/drop feature to Telegraf.
BTW, cAdvisor is ALREADY embedded in kubelet, (you access to the webpage)
so it's just easier to use it...

—
Reply to this email directly or view it on GitHub
#569 (comment)
.

jchauncey · 2016-03-03T19:26:32Z

So I have a branch which has fixed the prometheus plugin to have the necessary bits to talk to the kubernetes cluster - https://github.com/jchauncey/telegraf/tree/prometheus

I am going to test this locally for a bit before I submit a PR but I think it will work well for everyone.

titilambert · 2016-03-03T19:56:27Z

@jchauncey I made a PR for the prometheus plugin here: #707
Do you think we can merge them in one PR ?

jchauncey · 2016-03-03T20:13:13Z

Sure we thats fine. Really all we need is the ability to pass the bearer token in the http request.

titilambert · 2016-03-03T20:21:45Z

@jchauncey Could you copy/paste an example of Prometheus input plugin config for kubernetes ?
Thanks !

jchauncey · 2016-03-03T20:23:17Z

yup working on that now =)

jchauncey · 2016-03-03T20:32:20Z

@titilambert https://gist.github.com/jchauncey/18f3615d035fdbda141f

That includes my changes to your PR to have prometheus talk to kubernetes. It also includes a manifest for how to start the daemon (the image provided is using my PR).

I also have a special toml go template that I use to go from env vars to the values in the config.toml.

2016/03/03 20:28:45 Starting Telegraf (version 0.10.4.1-23-g64f9330)
2016/03/03 20:28:45 Loaded outputs: influxdb
2016/03/03 20:28:45 Loaded inputs: system disk netstat swap mem cpu diskio net influxdb prometheus
2016/03/03 20:28:45 Tags enabled: host=deis-monitor-telegraf-ibbf6
2016/03/03 20:28:45 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"deis-monitor-telegraf-ibbf6", Flush Interval:10s
2016/03/03 20:28:50 Gathered metrics, (10s interval), from 10 inputs in 239.362268ms
2016/03/03 20:29:00 Gathered metrics, (10s interval), from 10 inputs in 306.524271ms
2016/03/03 20:29:02 Wrote 3124 metrics to output influxdb in 2.368319457s
2016/03/03 20:29:10 Gathered metrics, (10s interval), from 10 inputs in 273.739466ms
2016/03/03 20:29:10 Wrote 3093 metrics to output influxdb in 280.546376ms

jchauncey · 2016-03-03T20:33:26Z

My one problem right now is that by running telegraf in a container the hostname reported is the container's hostname which is wrong. I would rather it be the node hostname (especially for host level metrics).

titilambert · 2016-03-03T20:47:31Z

BTW, I didn't get why you need your patch. I already can get data from my Kube-service: http://apiserver:8080/metrics

jchauncey · 2016-03-03T20:48:13Z

that doesnt work if your api server requires ssl

feelobot · 2016-03-03T20:48:22Z

Datadog has a similar issue and I resolved it for us like so DataDog/docker-dd-agent#67 however this is only valid for AWS environments. I did however create another tool that uses kubectl and this very hostname to determine what node it is running on https://github.com/Vungle/labelgun/blob/master/labelgun.go#L26 you could use the same logic.

titilambert · 2016-03-03T20:50:18Z

@jchauncey OK !
In fact, you want to get a container with --net=host option ?

jchauncey · 2016-03-03T20:52:39Z

no i want to collect host level metrics (cadvisor doesnt actually expose that many) by mounting in /proc and /sys. But reporting the host for these metrics as the hostname of the container doesnt really help me. I really want the true hostname of the node this is runnign on.

titilambert · 2016-03-03T21:08:44Z

Does this container run on kubernetes ? Or just as a simple docker container ?

jchauncey · 2016-03-03T21:09:00Z

it runs on kubernetes as a daemonset

titilambert · 2016-03-03T21:16:31Z

OK !
So I can suggest to build your Telegraf container from alpime/busybox.
Then when your container start, you can run a curl (with grep) command on http://apiserver:8080/api/v1/namespaces/MYNS/pods/$HOSTNAME to know where the current container is running !
Then you can edit your Telegraf config file (sed command ?) and start Telegraf
To do this you need to the namespace, but you can get it with http://kubernetes.io/v1.1/docs/user-guide/downward-api.html.
What do you think about ?

jchauncey · 2016-03-03T22:31:01Z

@titilambert like this -

export TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
export POD_API_URL=https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT/api/v1/namespaces/$POD_NAMESPACE/pods/$HOSTNAME
export AGENT_HOSTNAME=$(curl -s $POD_API_URL --header "Authorization: Bearer $TOKEN" --insecure | grep nodeName | cut -c 18- | tr -d '"')

That works and sets the correct hostname so thanks for the suggestion

jchauncey · 2016-03-03T23:20:52Z

@titilambert I am having pretty good success with your PR combined with my change for the bearer token. Let me know when your PR gets merged.

titilambert · 2016-03-04T04:00:46Z

@jchauncey maybe we should create a folder (in telegraf repo) for Prometheus for store application example (like kubernetes/etcd/...)

rvrignaud · 2016-03-09T14:39:32Z

Hi @jchauncey and @titilambert ,

You shouldn't have to do your trick to get the right hostname using a configuration like:

apiVersion: v1
kind: ReplicationController
metadata:
  name: telegraf-system-1
spec:
  replicas: xxx
  template:
    metadata:
     labels:
       name: telegraf-system
       version: "1"
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: telegraf
        image: gcr.io/xxx/telegraf:872ca924421857706a3a96b4e9e50fe4ccb2290b
        resources:
          limits:
            memory: 200Mi
        ports:
        - containerPort: 1025
          hostPort: 1025
          name: fake-tcp
          protocol: TCP
        volumeMounts:
          - name: dev
            mountPath: /dev
          - name: run
            mountPath: /var/run/docker.sock
          - name: sys
            mountPath: /sys
      volumes:
        - name: dev
          hostPath:
              path: /dev
        - name: run
          hostPath:
              path: /var/run/docker.sock
        - name: sys
          hostPath:
              path: /sys

I think this is hostNetwork that allow to get real hostname inside the pod but that needs to be checked.

My 2 cents

titilambert · 2016-03-09T15:16:04Z

@rvrignaud Thanks !

jchauncey · 2016-03-09T15:17:52Z

That binds the pod to the host network interface which most users dont want
to do. That basically for goes using the built in network layer of k8s.
On Mar 9, 2016 10:16 AM, "Thibault Cohen" notifications@github.com wrote:

@rvrignaud https://github.com/rvrignaud Thanks !

—
Reply to this email directly or view it on GitHub
#569 (comment)
.

jchauncey · 2017-03-10T20:09:50Z

@sparrc Doing a bit of house keeping on old issues ive opened and found this one. We can probably close it now that we have an input plugin in master.

sparrc added the plugin request label Jan 26, 2016

sparrc changed the title ~~Support for kubernetes~~ kubernetes input plugin Jan 26, 2016

jchauncey mentioned this issue Feb 2, 2016

feature(kubernetes plugin): Adds support for a kuberentes plugin #635

Closed

jchauncey mentioned this issue Feb 8, 2016

feature(kubernetes plugin): Adds support for a kuberentes plugin #665

Closed

jchauncey closed this as completed Feb 8, 2016

jchauncey reopened this Feb 8, 2016

sparrc added the help wanted Request for community participation, code, contribution label Mar 24, 2016

danielnelson closed this as completed Apr 18, 2017

kubernetes input plugin #569

kubernetes input plugin #569

Comments

jchauncey commented Jan 22, 2016

sparrc commented Jan 22, 2016

jchauncey commented Jan 22, 2016

jchauncey commented Jan 25, 2016

rvrignaud commented Jan 25, 2016

jchauncey commented Jan 25, 2016

jchauncey commented Jan 26, 2016

sparrc commented Jan 26, 2016

jchauncey commented Jan 27, 2016

sparrc commented Jan 27, 2016

jchauncey commented Jan 27, 2016

jchauncey commented Jan 28, 2016

sparrc commented Jan 28, 2016

jchauncey commented Feb 5, 2016

jchauncey commented Feb 8, 2016

jchauncey commented Feb 16, 2016

emmanuel commented Feb 20, 2016

emmanuel commented Feb 20, 2016

sparrc commented Feb 20, 2016

jchauncey commented Feb 20, 2016

feelobot commented Feb 24, 2016

rvrignaud commented Feb 24, 2016

titilambert commented Feb 27, 2016

jchauncey commented Feb 27, 2016

jchauncey commented Mar 3, 2016

titilambert commented Mar 3, 2016

jchauncey commented Mar 3, 2016

titilambert commented Mar 3, 2016

jchauncey commented Mar 3, 2016

jchauncey commented Mar 3, 2016

jchauncey commented Mar 3, 2016

titilambert commented Mar 3, 2016

jchauncey commented Mar 3, 2016

feelobot commented Mar 3, 2016

titilambert commented Mar 3, 2016

jchauncey commented Mar 3, 2016

titilambert commented Mar 3, 2016

jchauncey commented Mar 3, 2016

titilambert commented Mar 3, 2016

jchauncey commented Mar 3, 2016

jchauncey commented Mar 3, 2016

titilambert commented Mar 4, 2016

rvrignaud commented Mar 9, 2016

titilambert commented Mar 9, 2016

jchauncey commented Mar 9, 2016

jchauncey commented Mar 10, 2017