Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes input plugin #569

Closed
jchauncey opened this issue Jan 22, 2016 · 45 comments
Closed

kubernetes input plugin #569

jchauncey opened this issue Jan 22, 2016 · 45 comments
Labels
help wanted Request for community participation, code, contribution

Comments

@jchauncey
Copy link

It would be nice to see telegraf support kuberentes deployments. This means decorating metrics with appropriate pod labels (pod name, namespace, container name, etc..).

I'll try and take a stab at this soon as I would like to use telegraf instead of cadvisor for collecting metrics on my k8s cluster.

@sparrc
Copy link
Contributor

sparrc commented Jan 22, 2016

@jchauncey How are you monitoring the kubernetes deployment? I've recently overhauled the docker plugin to gather more metrics and allow specifying an endpoint, does this help? https://github.com/influxdata/telegraf/tree/master/plugins/inputs/docker

I'm not sure how kubernetes labels work, but the docker labels are applied as tags

@jchauncey
Copy link
Author

youll need to contact the kube api server to gather the information about each pod and the metrics its consuming.

This would be so I could monitor the k8s deployment. My goal is to supplant cadvisor with telegraf to have a tighter integration with influx.

@jchauncey
Copy link
Author

My thought right now is that when you collect the metrics on the container if they have turned on k8s metrics we reach out to the api server and fetch all the pod information. That would probably take place around here

@rvrignaud
Copy link

Hi @jchauncey,

This is slightly out of scope of this issue but for my k8s cluster in production I use both heapster running with InfluxDB sink and telegraf using prometheus input plugin to fetch kubelet metrics. This works pretty well. I'm happy to provide more informations if you want.

@jchauncey
Copy link
Author

So right now Id like to not rely on heapster and just use the TICK stack as a pure installation on top of k8s. I feel this would ultimately be more powerful than how the cadvisor+heapster stack is being managed. Especially since TICK will move faster than both of those (cadvisor + heapster)

@sparrc sparrc changed the title Support for kubernetes kubernetes input plugin Jan 26, 2016
@jchauncey
Copy link
Author

@sparrc is there a way to decorate data that has been generated in another plugin?

I'm trying to decide if polluting the docker plugin with k8s is the right way to go or not.

@sparrc
Copy link
Contributor

sparrc commented Jan 26, 2016

I think making a separate kubernetes plugin would be better.

It's okay if there is some duplicated docker code. Or you could also break out the docker metric collection into a separate package and put it in internal/.

@jchauncey
Copy link
Author

@sparrc is there a reason you used cont_ as the prefix and not container_?

@sparrc
Copy link
Contributor

sparrc commented Jan 27, 2016

@jchauncey not particularly.......I'm not opposed to changing it to container_

@jchauncey
Copy link
Author

k ill probably make that change as part of my refactor to support the k8s plugin.

jchauncey pushed a commit to jchauncey/telegraf that referenced this issue Jan 28, 2016
fixes influxdata#569
This commit also refactors the existing docker plugin so that the
kubernetes plugin can gather metrics from the docker system.
jchauncey pushed a commit to jchauncey/telegraf that referenced this issue Jan 28, 2016
fixes influxdata#569
This commit also refactors the existing docker plugin so that the
kubernetes plugin can gather metrics from the docker system.
jchauncey pushed a commit to jchauncey/telegraf that referenced this issue Jan 28, 2016
fixes influxdata#569
This commit also refactors the existing docker plugin so that the
kubernetes plugin can gather metrics from the docker system.
jchauncey pushed a commit to jchauncey/telegraf that referenced this issue Jan 28, 2016
fixes influxdata#569
This commit also refactors the existing docker plugin so that the
kubernetes plugin can gather metrics from the docker system.
@jchauncey
Copy link
Author

@sparrc can you do token replacements in the toml file? For example if I do:

urls = ["http://$INFLUXDB_HOST:$INFLUXDB_PORT"]

will it do the right thing?

@sparrc
Copy link
Contributor

sparrc commented Jan 28, 2016

I don't think that's part of the toml spec, so probably not

jchauncey pushed a commit to jchauncey/telegraf that referenced this issue Feb 1, 2016
fixes influxdata#569
This commit also refactors the existing docker plugin so that the
kubernetes plugin can gather metrics from the docker system.
jchauncey pushed a commit to jchauncey/telegraf that referenced this issue Feb 2, 2016
fixes influxdata#569
This commit also refactors the existing docker plugin so that the
kubernetes plugin can gather metrics from the docker system.
@jchauncey
Copy link
Author

@sparrc so ive been trying to strip out the kubernetes api from the main repo and reduce the number of dependencies. however, it doesnt seem to be helping. No matter what I try I cant get the binary below 54 megs.

outside of writing a brand new client im not sure what i can do to help

@jchauncey
Copy link
Author

hah it closed this issue because I merged my PR into the deis org. Anyways for now I am going to maintain my own fork of telegraf with the k8s plugin since I cannot seem to reduce the binary size. If you decide that you are ok with the binary size I'll resubmit the PR.

@jchauncey
Copy link
Author

So I found this go project called goupx which reduced the binary size of telegraf (with the kubernetes deps) to 12M.

╰─± goupx telegraf
2016/02/16 11:18:37 {Class:ELFCLASS64 Data:ELFDATA2LSB Version:EV_CURRENT OSABI:ELFOSABI_NONE ABIVersion:0 ByteOrder:LittleEndian Type:ET_EXEC Machine:EM_X86_64 Entry:4588112}
2016/02/16 11:18:37 Hemming PT_LOAD section
2016/02/16 11:18:37 File fixed!
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2013
UPX 3.91        Markus Oberhumer, Laszlo Molnar & John Reiser   Sep 30th 2013

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
  55794160 ->  12086424   21.66%  linux/ElfAMD   telegraf
╰─± ls -alh | grep telegraf
-rwxr-xr-x   1 jonathanchauncey  staff    12M Feb 16 11:18 telegraf

@emmanuel
Copy link

@rvrignaud I would love to hear about how you're using Heapster (with InfluxDB sink) together with Telegraf's Prometheus plugin for kubelet metrics. What does the basic data-flow and setup look like?

Thanks for any insight!

@emmanuel
Copy link

Also, the Docker containers started by the kubelet contain Docker labels for several pieces of kubernetes metadata. See kubernetes/kubernetes#17234 for some details. It looks like k8s container name, pod name, namespace and some other (static) metadata is attached as labels on container startup. If it covers the desired tag values, this may be easier (and more efficient) that going out to the k8s apiserver for pod details.

@sparrc
Copy link
Contributor

sparrc commented Feb 20, 2016

There is also a contributor who is making improvements to the prometheus plugin to allow it to collect kubernetes metrics: #707

@jchauncey
Copy link
Author

I don't think k8s current applies pod labels to the docker label structure. There are a few items in the docker labels but I don't think pod namespace nor pod name are in that list. It definitely doesn't contain pod labels set via the manifest. That could change in future releases.

@feelobot
Copy link

👍 definitely waiting for this to get merged in

@rvrignaud
Copy link

Hi @emmanuel,

Not sure to understand what you are looking for, but here is what I have:
I'm currently running a single influxdb node (0.10.x) outside of kubernetes.

  • Heapster running with influxdb sink (I actually have 2 heapster, one managed for me by GKE that have the Google Cloud Monitoring backend and this one). The configuration is pretty straight forward:
      containers:
      - name: heapster
        image: gcr.io/google_containers/heapster:v0.19.1
        resources:
          limits:
            memory: 550Mi
        command:
          - /heapster
          - --source=kubernetes:''
          - --sink=influxdb:http://influxdb:8086
          - --sink_frequency=30s
  • telegraf running with prometheus inputs in a pod with a script using the downward API to get all k8s nodes
[[inputs.prometheus]]
urls = ["http://host1:10255/metrics", "http://host2:10255/metrics", "..."]
  • telegraf is also running a home made python script that uses donward API to compute cluster wide metrics

@titilambert
Copy link
Contributor

@emmanuel @jchauncey I'm agree with @rvrignaud ! I think the best think to do is to use Heapster to get container metrics and use Prometheus input plugin to get k8s (infrastructure) metrics from kube-apiserver/kube-scheduler/kube-controller-manager/kubelet.
This PR is a rewrite of heapster... I also tried to make a plugin specific for kubernetes (using "/metrics" pages of each process) (#691) but it was better to just improve Prometheus plugin and add metric pass/drop feature to Telegraf.
BTW, cAdvisor is ALREADY embedded in kubelet, (you access to the webpage) so it's just easier to use it...

@jchauncey
Copy link
Author

So I agree completely and have already got work planned starting this week
to figure that out and get it documented.
On Feb 27, 2016 3:22 PM, "Thibault Cohen" notifications@github.com wrote:

@emmanuel https://github.com/emmanuel @jchauncey
https://github.com/jchauncey I'm agree with @rvrignaud
https://github.com/rvrignaud ! I think the best think to do is to use
Heapster to get container metrics and use Prometheus input plugin to get
k8s (infrastructure) metrics from
kube-apiserver/kube-scheduler/kube-controller-manager/kubelet.
This PR is a rewrite of heapster... I also tried to make a plugin specific
for kubernetes (using "/metrics" pages of each process) (#691
#691) but it was better to
just improve Prometheus plugin and add metric pass/drop feature to Telegraf.
BTW, cAdvisor is ALREADY embedded in kubelet, (you access to the webpage)
so it's just easier to use it...


Reply to this email directly or view it on GitHub
#569 (comment)
.

@jchauncey
Copy link
Author

So I have a branch which has fixed the prometheus plugin to have the necessary bits to talk to the kubernetes cluster - https://github.com/jchauncey/telegraf/tree/prometheus

I am going to test this locally for a bit before I submit a PR but I think it will work well for everyone.

@titilambert
Copy link
Contributor

@jchauncey I made a PR for the prometheus plugin here: #707
Do you think we can merge them in one PR ?

@jchauncey
Copy link
Author

Sure we thats fine. Really all we need is the ability to pass the bearer token in the http request.

@titilambert
Copy link
Contributor

@jchauncey Could you copy/paste an example of Prometheus input plugin config for kubernetes ?
Thanks !

@jchauncey
Copy link
Author

yup working on that now =)

@jchauncey
Copy link
Author

@titilambert https://gist.github.com/jchauncey/18f3615d035fdbda141f

That includes my changes to your PR to have prometheus talk to kubernetes. It also includes a manifest for how to start the daemon (the image provided is using my PR).

I also have a special toml go template that I use to go from env vars to the values in the config.toml.

2016/03/03 20:28:45 Starting Telegraf (version 0.10.4.1-23-g64f9330)
2016/03/03 20:28:45 Loaded outputs: influxdb
2016/03/03 20:28:45 Loaded inputs: system disk netstat swap mem cpu diskio net influxdb prometheus
2016/03/03 20:28:45 Tags enabled: host=deis-monitor-telegraf-ibbf6
2016/03/03 20:28:45 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"deis-monitor-telegraf-ibbf6", Flush Interval:10s
2016/03/03 20:28:50 Gathered metrics, (10s interval), from 10 inputs in 239.362268ms
2016/03/03 20:29:00 Gathered metrics, (10s interval), from 10 inputs in 306.524271ms
2016/03/03 20:29:02 Wrote 3124 metrics to output influxdb in 2.368319457s
2016/03/03 20:29:10 Gathered metrics, (10s interval), from 10 inputs in 273.739466ms
2016/03/03 20:29:10 Wrote 3093 metrics to output influxdb in 280.546376ms

@jchauncey
Copy link
Author

My one problem right now is that by running telegraf in a container the hostname reported is the container's hostname which is wrong. I would rather it be the node hostname (especially for host level metrics).

@titilambert
Copy link
Contributor

BTW, I didn't get why you need your patch. I already can get data from my Kube-service: http://apiserver:8080/metrics

@jchauncey
Copy link
Author

that doesnt work if your api server requires ssl

@feelobot
Copy link

feelobot commented Mar 3, 2016

Datadog has a similar issue and I resolved it for us like so DataDog/docker-dd-agent#67 however this is only valid for AWS environments. I did however create another tool that uses kubectl and this very hostname to determine what node it is running on https://github.com/Vungle/labelgun/blob/master/labelgun.go#L26 you could use the same logic.

@titilambert
Copy link
Contributor

@jchauncey OK !
In fact, you want to get a container with --net=host option ?

@jchauncey
Copy link
Author

no i want to collect host level metrics (cadvisor doesnt actually expose that many) by mounting in /proc and /sys. But reporting the host for these metrics as the hostname of the container doesnt really help me. I really want the true hostname of the node this is runnign on.

@titilambert
Copy link
Contributor

Does this container run on kubernetes ? Or just as a simple docker container ?

@jchauncey
Copy link
Author

it runs on kubernetes as a daemonset

@titilambert
Copy link
Contributor

OK !
So I can suggest to build your Telegraf container from alpime/busybox.
Then when your container start, you can run a curl (with grep) command on http://apiserver:8080/api/v1/namespaces/MYNS/pods/$HOSTNAME to know where the current container is running !
Then you can edit your Telegraf config file (sed command ?) and start Telegraf
To do this you need to the namespace, but you can get it with http://kubernetes.io/v1.1/docs/user-guide/downward-api.html.
What do you think about ?

@jchauncey
Copy link
Author

@titilambert like this -

export TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
export POD_API_URL=https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT/api/v1/namespaces/$POD_NAMESPACE/pods/$HOSTNAME
export AGENT_HOSTNAME=$(curl -s $POD_API_URL --header "Authorization: Bearer $TOKEN" --insecure | grep nodeName | cut -c 18- | tr -d '"')

That works and sets the correct hostname so thanks for the suggestion

@jchauncey
Copy link
Author

@titilambert I am having pretty good success with your PR combined with my change for the bearer token. Let me know when your PR gets merged.

@titilambert
Copy link
Contributor

@jchauncey maybe we should create a folder (in telegraf repo) for Prometheus for store application example (like kubernetes/etcd/...)

@rvrignaud
Copy link

Hi @jchauncey and @titilambert ,

You shouldn't have to do your trick to get the right hostname using a configuration like:

apiVersion: v1
kind: ReplicationController
metadata:
  name: telegraf-system-1
spec:
  replicas: xxx
  template:
    metadata:
     labels:
       name: telegraf-system
       version: "1"
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: telegraf
        image: gcr.io/xxx/telegraf:872ca924421857706a3a96b4e9e50fe4ccb2290b
        resources:
          limits:
            memory: 200Mi
        ports:
        - containerPort: 1025
          hostPort: 1025
          name: fake-tcp
          protocol: TCP
        volumeMounts:
          - name: dev
            mountPath: /dev
          - name: run
            mountPath: /var/run/docker.sock
          - name: sys
            mountPath: /sys
      volumes:
        - name: dev
          hostPath:
              path: /dev
        - name: run
          hostPath:
              path: /var/run/docker.sock
        - name: sys
          hostPath:
              path: /sys

I think this is hostNetwork that allow to get real hostname inside the pod but that needs to be checked.

My 2 cents

@titilambert
Copy link
Contributor

@rvrignaud Thanks !

@jchauncey
Copy link
Author

That binds the pod to the host network interface which most users dont want
to do. That basically for goes using the built in network layer of k8s.
On Mar 9, 2016 10:16 AM, "Thibault Cohen" notifications@github.com wrote:

@rvrignaud https://github.com/rvrignaud Thanks !


Reply to this email directly or view it on GitHub
#569 (comment)
.

@sparrc sparrc added the help wanted Request for community participation, code, contribution label Mar 24, 2016
@jchauncey
Copy link
Author

@sparrc Doing a bit of house keeping on old issues ive opened and found this one. We can probably close it now that we have an input plugin in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Request for community participation, code, contribution
Projects
None yet
7 participants