Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Telegraf Collector for vSphere Object Metric Collection #1420

Closed
exbane opened this issue Jun 24, 2016 · 24 comments · Fixed by #4141
Closed

Create Telegraf Collector for vSphere Object Metric Collection #1420

exbane opened this issue Jun 24, 2016 · 24 comments · Fixed by #4141
Labels
feature request Requests for new plugin and for new features to existing plugins
Milestone

Comments

@exbane
Copy link

exbane commented Jun 24, 2016

Feature Request

Create Telegraf Collector for vSphere Object Metric Collection

Proposal:

There are already telegraf collectors for Windows,Linux,Unix systems to collect from a multitude of apps and systems but is currently limited in it's ability to collect from a vSphere API and ship the data to InfluxDB for proper graphing.

Current behavior:

Currently there are a few projects out there that can do this but not very well in my opinion.

The StatsFeeder fling is good at collecting all host and VM 20 second performance metrics but it's limited in how you can parse that data and currently has no ability to collect data against VMFS datastores for performance graphing. The project also has not been updated since 2013.

vSphere2Metrics is good for collecting those 5 minute intervals for your infrastructure and by far was the better one out of the stack but the time it takes to collect against an extremely large infrastructure with thousands of VMs is undesirable. This also has not been updated for quite some time.

SYNAXON/GraphiteReceiver - This works in conjunction with StatsFeeder to send the performance metrics to a graphite instance. It does work with sending the metrics to the graphite reciever on InfluxDB but for some reason the metric isn't getting parsed out properly and shows the metric as the entirety of the metric, name,metric,timestamp all in one. There isn't a good way of collecting against datastores and the project hasn't been updated in quite a while.

SexiGraf - This was a promising project and seems to work out well for the most part.. There are some holes in the product that makes it difficult for a larger shop to adopt such a thing.

Desired behavior:

A telegraf collector developed to gather whatever 20 second metrics you want from a vSphere infrastructure including Hosts, Datastores, VMs, Resource Pools, Clusters, Datacenters. Having the ability to parse the information for clusters is extremely desirable for creating cluster specific graphs in Grafana or Chronograf.

Use case: [Why is this important (helps with prioritizing requests)]

A lot of companies that put in the investment to purchase vSphere don't always have to budget to purchase the expensive monitoring tools from VMware such as vRealize Operations Manager. Having the ability to collect metrics and graph them through an Open Source system yourself and have that data be accurate would be a huge advantage for VM Admins. Just food for thought. Let me know if you have any questions.

-Adam

@zp-markusp
Copy link

+1 would be awesome to have transparency in here too...

@brandonweeks
Copy link

I've been working on extracting metrics via the vSphere Performance API with govmomi that would be compatable with Telegraf. Is everyone more interested in the "raw" data that is available from the ESXi hosts directly (like StatsFeeder) or more comprehensively collecting both the real time data and the various aggregate data vSphere collects (like vRealize)?

@exbane
Copy link
Author

exbane commented Jul 15, 2016

I think to start something like statsfeeder would be good but overall a comprehensive solution like vRealize would be great. I think something along those lines would be a great addition to companies that can't afford vROPs.

@R-Sommer
Copy link

R-Sommer commented Aug 3, 2016

I'd really appreciate to have a collector for telegraf as I still haven't found an satisfying solution for vSphere. Are there any plans and even a timeline?

@sparrc
Copy link
Contributor

sparrc commented Aug 3, 2016

Nope, there are currently no plans or timeline, this plugin will likely need to come from the community for it to get done.

@steverweber
Copy link

I did post a script for this a few months ago.... need a place to put a collection of exec scripts...

https://github.com/uwaterloo-s8weber/influxdb-metrics-vmware
you can run that and pass no uri arg and the exec script might fire the data off to influxdb.

note: telegraf might have issues with '\n\r' on windows so might need a patch for that first.

@steverweber
Copy link

and yap that script is slow... perhaps it could be threaded to give it a boost.

@awilson77584
Copy link

Xorux has lpar2rrd (GNU GPL) that also does VMware monitoring. I'm trying to get the same type of information for lpars and the AIX frame. I'm trying to get the data into Telegraf for tagging and then to Influxdb. I'll post back if I have success.

@Integrative
Copy link

Would definitely be appreciated,we're now using SNMP to poll data from individual hosts, but that is just too time consuming and fragile to present reliable data. Was looking at pyvomi to build something like this, just haven't come to it yet

@astolle
Copy link

astolle commented May 16, 2017

Hi,

I had the same problem and solved it with telegaf and the exec-input plugin. The plugin executes a small shell script, which uses govmomi to gather metrics from the vCenter. Works great within an Ubuntu container.

  • telegraf-input.conf
[[inputs.exec]]
  commands = ["/usr/local/bin/cpu-metrics.sh /$PATH/host/$CLUSTER/*"]
  timeout = "15s"
  data_format = "influx"
  • cpu-metrics.sh
#!/bin/sh

# use "govc ls" to find your path
PATH="$1"

# metric.sample
#   - govmomi usage is documented here: https://github.com/vmware/govmomi/blob/master/govc/USAGE.md
#   - instance: -i=* will output avg util of all cores per core AND "-" as average for those
GOVC="/usr/local/bin/govc metric.sample -json=false -n=1 -instance=* -t=false $PATH cpu.utilization.average"

# output format:
#   - output in fluxdata protocol: https://docs.influxdata.com/influxdb/v0.9/write_protocols/line/
$GOVC | /usr/bin/awk -F".example.net" '{print $1 " " $2}' | /usr/bin/awk '$2 ~ /-/ {print "esxi,host="$1" cputil="$4}'
  • env vars needed for govc to work:
ENV GOVC_URL https://myvcenter.example.com
ENV GOVC_USERNAME myUser
ENV GOVC_PASSWORD mySecretPass
#ENV GOVC_INSECURE true

Not perfect, but homebrewn. Maybe this helps someone.

Best, Alex

@britcey
Copy link

britcey commented May 16, 2017

I ran across https://github.com/Oxalide/vsphere-influxdb-go, which looks to do what we want, albeit outside of Telegraf itself (I haven't tested yet) - might be worthwhile to convert to a Telegraf input plugin. Still a Go newbie, so that's currently beyond me.

@sachinrase
Copy link

This is really useful feature as the other monitoring software have this as part of their base install ,
IMHO telegraf can be universal agent for all cloud with addition of vcenter support.

Zabbix :
https://www.zabbix.com/documentation/3.4/manual/vm_monitoring

Sensu :
sensu-plugins/community#13
https://github.com/vmware/rbvmomi

Nagios:
https://exchange.nagios.org/directory/Plugins/Operating-Systems/*-Virtual-Environments/VMWare/box293_check_vmware/details

Promethus :
https://github.com/sapcc/vcenter-exporter/blob/master/vcenter-exporter.py

/cc @timhallinflux

@mkuzmin
Copy link

mkuzmin commented Jul 24, 2017

I found there is a PR for native vSphere plugin by @mlabouardy opened since April at #2682.

But the plugin is not complete yet. At the moment I keep my contributions at https://github.com/mkuzmin/telegraf/commits/vsphere

If anyone wants to try, here are binaries
https://github.com/mkuzmin/telegraf/releases/

@mkuzmin
Copy link

mkuzmin commented Jul 26, 2017

I suppose my fork is pretty complete now. Telegraf now can collect metrics from hosts, filter objects by mask, handle errors. Fields are renamed in line to titles in user interface.

I'd like to have early feedback, especially about new field naming scheme.
Please check out README and binaries.

@exbane
Copy link
Author

exbane commented Jul 27, 2017 via email

@MicKBfr
Copy link

MicKBfr commented Jul 31, 2017

Hi,

That's a good start

Don't forget to collect IOPS and latency for disk and datastore and have tag by esx, datastore for each VM.

At this time i used https://github.com/Oxalide/vsphere-influxdb-go to collect IOPS per datastore and find which VM is responsible of high IOPS...

Thanks,

@danielnelson danielnelson added feature request Requests for new plugin and for new features to existing plugins and removed plugin request labels Aug 12, 2017
@mjseid
Copy link

mjseid commented Sep 1, 2017

@mkuzmin would be nice to include a tag for the cluster a resource is in. I believe it would be useful for most folks, since its more common to monitor for example memory usage of a cluster vs individual hosts to know when to add cluster capacity.

@dR3b
Copy link

dR3b commented Nov 20, 2017

Any news at all?

@huyujie
Copy link

huyujie commented Nov 29, 2017

@mkuzmin when I use your vsphere plugin collect,output 'default datacenter resolves to multiple instances, please specify',can you tell me why?

@genebean
Copy link

genebean commented Feb 3, 2018

@mkuzmin cpu/ready would be really helpful too for both VM's and hosts.

@melliott-sis
Copy link

I'd love to know if/when this is going to be merged in. Collecting NSX metrics would also be very helpful.

@russorat
Copy link
Contributor

does this plugin satisfy this? #4141

@endersonmaia
Copy link

there are two PRs for this, #4141 and #2682

@dshuvar
Copy link

dshuvar commented Jan 9, 2019

hm, @prydin @mkuzmin can I use it for vrops(non-direct from vsphere)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.