Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Win_PerfCounters: Graphite output for dotted counter names #5173

Closed
prof79 opened this issue Dec 20, 2018 · 5 comments
Closed

Win_PerfCounters: Graphite output for dotted counter names #5173

prof79 opened this issue Dec 20, 2018 · 5 comments
Labels
discussion Topics for discussion

Comments

@prof79
Copy link

prof79 commented Dec 20, 2018

Hi there,

is there a solution/workaround/recommended procedure for dotted performance counter names like:

"Avg. Disk sec/Read", "Avg. Disk sec/Write", "Avg. Disk sec/Transfer",

Due to the dot/full stop in the counter name an additional metric level is introduced when writing such Telegraf input data to a Graphite/Carbon database.

To better visualize the problem see screenshot:

win_disk_sample

In my view it should be "Avg_Disk_sec-Read", "Avg_Disk_sec-Transfer" and "Avg_Disk_sec-Write" as a whole.

This additional level makes it difficult to properly parse/generate variables from the metrics in Grafana.

@danielnelson
Copy link
Contributor

I haven't tried it, but it should be possible to replace any dots with another character using a strings processor:

[[processors.strings]]
  namepass = "win_disk"
  [[processors.strings.replace]]
    field = "*"
    old = "."
    new = "_"

At some point, I would like to have win_perf_counters generate metrics that match the Telegraf snake_case style.

Let me know if this works.

@danielnelson danielnelson added the discussion Topics for discussion label Dec 20, 2018
@prof79
Copy link
Author

prof79 commented Dec 21, 2018

Thanks I've tried several ways using measurement instead of field and such but somehow can't get it to work :-(

Server is still producing something like this:

> win_disk,host=SERVERNAME,instance=C:,objectname=LogicalDisk Avg._Disk_sec/Read=0,Avg._Disk_sec/Transfer=0.0017691525863
483548,Avg._Disk_sec/Write=0.0017691525863483548,Current_Disk_Queue_Length=0,Free_Megabytes=774,Percent_Disk_Read_Time=0
,Percent_Disk_Time=51.03920364379883,Percent_Disk_Write_Time=51.03920364379883,Percent_Free_Space=0.759791910648346,Perc
ent_Idle_Time=91.99574279785156 1545384198000000000

My config:

#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply prepend
# them with $. For strings the variable must be within quotes (ie, "$STR_VAR"),
# for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR)


# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"


# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  #interval = "10s"
  interval = "5s"
  
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = ""

  ## Logging configuration:
  ## Run telegraf with debug log messages.
  debug = false
  ## Run telegraf in quiet mode (error log messages only).
  quiet = false
  ## Specify the log file name. The empty string means to log to stderr.
  logfile = "/Program Files/Telegraf/telegraf.log"

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false


###############################################################################
#                                  OUTPUTS                                    #
###############################################################################

# Configuration for sending metrics to InfluxDB
#[[outputs.influxdb]]
  ## The full HTTP or UDP URL for your InfluxDB instance.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  # urls = ["unix:///var/run/influxdb.sock"]
  # urls = ["udp://127.0.0.1:8089"]
  # urls = ["http://127.0.0.1:8086"]

  ## The target database for metrics; will be created as needed.
  # database = "telegraf"

  ## If true, no CREATE DATABASE queries will be sent.  Set to true when using
  ## Telegraf with a user without permissions to create databases or when the
  ## database already exists.
  # skip_database_creation = false

  ## Name of existing retention policy to write to.  Empty string writes to
  ## the default retention policy.  Only takes effect when using HTTP.
  # retention_policy = ""

  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all".
  ## Only takes effect when using HTTP.
  # write_consistency = "any"

  ## Timeout for HTTP messages.
  # timeout = "5s"

  ## HTTP Basic Auth
  # username = "telegraf"
  # password = "metricsmetricsmetricsmetrics"

  ## HTTP User-Agent
  # user_agent = "telegraf"

  ## UDP payload size is the maximum packet size to send.
  # udp_payload = "512B"

  ## Optional TLS Config for use on HTTP connections.
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

  ## HTTP Proxy override, if unset values the standard proxy environment
  ## variables are consulted to determine which proxy, if any, should be used.
  # http_proxy = "http://corporate.proxy:3128"

  ## Additional HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}

  ## HTTP Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "identity"

  ## When true, Telegraf will output unsigned integers as unsigned values,
  ## i.e.: "42u".  You will need a version of InfluxDB supporting unsigned
  ## integer values.  Enabling this option will result in field type errors if
  ## existing data has been written.
  # influx_uint_support = false


# Configuration for Graphite server to send metrics to
[[outputs.graphite]]
  ## TCP endpoint for your graphite instance.
  ## If multiple endpoints are configured, the output will be load balanced.
  ## Only one of the endpoints will be written to with each iteration.
  servers = ["myserver.mydomain.tld:2003"]
  ## Prefix metrics name
  prefix = "telegraf.win_perf"
  ## Graphite output template
  ## see https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  #template = "host.tags.measurement.field"
  template = "host.measurement.field.tags"

  ## Enable Graphite tags support
  # graphite_tag_support = false

  ## timeout in seconds for the write connection to graphite
  timeout = 2

  ## Optional TLS Config
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false


###############################################################################
#                                  INPUTS                                     #
###############################################################################

# Windows Performance Counters plugin.
# These are the recommended method of monitoring system metrics on windows,
# as the regular system plugins (inputs.cpu, inputs.mem, etc.) rely on WMI,
# which utilize more system resources.
#
# See more configuration examples at:
#   https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters

[[inputs.win_perf_counters]]
  [[inputs.win_perf_counters.object]]
    # Processor usage, alternative to native, reports on a per core.
    ObjectName = "Processor"
    Instances = ["*"]
    Counters = [
      "% Idle Time",
      "% Interrupt Time",
      "% Privileged Time",
      "% User Time",
      "% Processor Time",
      "% DPC Time",
    ]
    Measurement = "win_cpu"
    # Set to true to include _Total instance when querying for all (*).
    IncludeTotal=true

  [[inputs.win_perf_counters.object]]
    # Process metrics
	ObjectName = "Process"
	Instances = ["*"]
	Counters = [
	  "% Processor Time",
	  "Private Bytes",
	  "Virtual Bytes Peak"
	]
	Measurement = "win_process"
	# Set to true to include _Total instance when querying for all (*).
    #IncludeTotal=false

  [[inputs.win_perf_counters.object]]
    # Disk times and queues
    ObjectName = "LogicalDisk"
    Instances = ["*"]
    Counters = [
      "% Idle Time",
      "% Disk Time",
      "% Disk Read Time",
      "% Disk Write Time",
      "Current Disk Queue Length",
      "% Free Space",
      "Free Megabytes",
	  "Avg. Disk sec/Read",
	  "Avg. Disk sec/Write",
	  "Avg. Disk sec/Transfer",
	  "File Write Operations/sec",
	  "File Read Operations/sec",
	  "File Read Bytes/sec",
	  "File Write Bytes/sec",
	  "Avg. sec/Read",
	  "Avg. sec/Write",
    ]
    Measurement = "win_disk"
    # Set to true to include _Total instance when querying for all (*).
    #IncludeTotal=false

  [[inputs.win_perf_counters.object]]
    ObjectName = "PhysicalDisk"
    Instances = ["*"]
    Counters = [
      "Disk Read Bytes/sec",
      "Disk Write Bytes/sec",
      "Current Disk Queue Length",
      "Disk Reads/sec",
      "Disk Writes/sec",
      "% Disk Time",
      "% Disk Read Time",
      "% Disk Write Time",
	  "Avg. Disk sec/Read",
      "Avg. Disk sec/Write",
	  "Avg. Disk sec/Transfer",
	  "File Write Operations/sec",
	  "File Read Operations/sec",
	  "File Read Bytes/sec",
	  "File Write Bytes/sec",
	  "Avg. sec/Read",
	  "Avg. sec/Write",	  
    ]
    Measurement = "win_diskio"

  [[inputs.win_perf_counters.object]]
    ObjectName = "Network Interface"
    Instances = ["*"]
    Counters = [
	  "Current Bandwidth",
      "Bytes Received/sec",
      "Bytes Sent/sec",
	  "Bytes Total/sec",
      "Packets Received/sec",
      "Packets Sent/sec",
      "Packets Received Discarded",
      "Packets Outbound Discarded",
      "Packets Received Errors",
      "Packets Outbound Errors",
    ]
    Measurement = "win_net"

  [[inputs.win_perf_counters.object]]
    ObjectName = "System"
    Counters = [
      "Context Switches/sec",
      "System Calls/sec",
      "Processor Queue Length",
      "System Up Time",
    ]
    Instances = ["------"]
    Measurement = "win_system"
    # Set to true to include _Total instance when querying for all (*).
    #IncludeTotal=false

  [[inputs.win_perf_counters.object]]
    # Example query where the Instance portion must be removed to get data back,
    # such as from the Memory object.
    ObjectName = "Memory"
    Counters = [
      "Available Bytes",
	  "Free Bytes",
	  "Cache Bytes",
      "Cache Faults/sec",
      "Demand Zero Faults/sec",
      "Page Faults/sec",
      "Pages/sec",
	  "Page Reads/sec",
      "Transition Faults/sec",
      "Pool Nonpaged Bytes",
      "Pool Paged Bytes",
      "Standby Cache Reserve Bytes",
      "Standby Cache Normal Priority Bytes",
      "Standby Cache Core Bytes",
      "Free & Zero Page List Bytes",
	  "Committed Bytes",
	  "% Committed Bytes In Use",
    ]
    # Use 6 x - to remove the Instance bit from the query.
    Instances = ["------"]
    Measurement = "win_mem"
    # Set to true to include _Total instance when querying for all (*).
    #IncludeTotal=false

  [[inputs.win_perf_counters.object]]
    # Example query where the Instance portion must be removed to get data back,
    # such as from the Paging File object.
    ObjectName = "Paging File"
    Counters = [
      "% Usage",
    ]
    Instances = ["_Total"]
    Measurement = "win_swap"

  [[inputs.win_perf_counters.object]]
    ObjectName = "HTTP Service Request Queues"
    Counters = [
      "CurrentQueueSize",
      "RejectedRequests",
    ]
    Instances = ["------"]
    Measurement = "win_http_service_request_queues"
    # Set to true to include _Total instance when querying for all (*).
    #IncludeTotal=false

  [[inputs.win_perf_counters.object]]
    ObjectName = "APP_POOL_WAS"
    Counters = [
      "Current Application Pool State",
      "Current Application Pool Uptime",
    ]
    Instances = ["------"]
    Measurement = "win_app_pool_was"
    # Set to true to include _Total instance when querying for all (*).
    #IncludeTotal=false

  [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Dynamic Memory Balancer"
    Counters = [
      "Average Pressure",
    ]
    Instances = ["*"]
    Measurement = "win_hyper_v"
    # Set to true to include _Total instance when querying for all (*).
    IncludeTotal=true


# Windows system plugins using WMI (disabled by default, using
# win_perf_counters over WMI is recommended)


# # Read metrics about cpu usage
# [[inputs.cpu]]
#   ## Whether to report per-cpu stats or not
#   percpu = true
#   ## Whether to report total system cpu stats or not
#   totalcpu = true
#   ## If true, collect raw CPU time metrics.
#   collect_cpu_time = false
#   ## If true, compute and report the sum of all non-idle CPU states.
#   report_active = false


# # Read metrics about disk usage by mount point
# [[inputs.disk]]
#   ## By default stats will be gathered for all mount points.
#   ## Set mount_points will restrict the stats to only the specified mount points.
#   # mount_points = ["/"]
#
#   ## Ignore mount points by filesystem type.
#   ignore_fs = ["tmpfs", "devtmpfs", "devfs", "overlay", "aufs", "squashfs"]


# # Read metrics about disk IO by device
# [[inputs.diskio]]
#   ## By default, telegraf will gather stats for all devices including
#   ## disk partitions.
#   ## Setting devices will restrict the stats to the specified devices.
#   # devices = ["sda", "sdb", "vd*"]
#   ## Uncomment the following line if you need disk serial numbers.
#   # skip_serial_number = false
#   #
#   ## On systems which support it, device metadata can be added in the form of
#   ## tags.
#   ## Currently only Linux is supported via udev properties. You can view
#   ## available properties for a device by running:
#   ## 'udevadm info -q property -n /dev/sda'
#   # device_tags = ["ID_FS_TYPE", "ID_FS_USAGE"]
#   #
#   ## Using the same metadata source as device_tags, you can also customize the
#   ## name of the device via templates.
#   ## The 'name_templates' parameter is a list of templates to try and apply to
#   ## the device. The template may contain variables in the form of '$PROPERTY' or
#   ## '${PROPERTY}'. The first template which does not contain any variables not
#   ## present for the device is used as the device name tag.
#   ## The typical use case is for LVM volumes, to get the VG/LV name instead of
#   ## the near-meaningless DM-0 name.
#   # name_templates = ["$ID_FS_LABEL","$DM_VG_NAME/$DM_LV_NAME"]


# # Read metrics about memory usage
# [[inputs.mem]]
#   # no configuration


# # Read metrics about swap memory usage
# [[inputs.swap]]
#   # no configuration


###############################################################################
#                                 PROCESSORS                                  #
###############################################################################

# See: https://github.com/influxdata/telegraf/issues/5173
[[processors.strings]]
  namepass = "win_disk"

  [[processors.strings.replace]]
    field = "*"
    old = "."
    new = "_"

#[[processors.regex]]
#  namepass = "win_disk"
#  
#  [[processors.regex.fields]]
#    key = "*"
#	pattern = "(.*)\\.(.*)"
#	replacement = "${1}_${2}"

@danielnelson
Copy link
Contributor

Sorry, I gave you bad advice, the strings processor only works on the field values and cannot rename the field key. Also, namepass should be an array of values, this is to limit the number of measurements that must be checked for the field.

I believe the only way to do this is with the rename processor. It doesn't support substring replacement, so you would need one rule per field:

[[processors.rename]]
  namepass = ["win_disk"]
  [[processors.rename.replace]]
    field = "Avg._Disk_sec/Write"
    dest = "Avg_Disk_sec/Write"

  [[processors.rename.replace]]
    field = "Avg._Disk_sec/Transfer"
    dest = "Avg_Disk_sec/Transfer"

@prof79
Copy link
Author

prof79 commented Dec 27, 2018

Thank you, that's amazing! Exactly what I wanted:

telegraf_disk_rewritten

I refined your version a bit:

[[processors.rename]]
  namepass = ["win_disk", "win_diskio"]

  [[processors.rename.replace]]
    field = "Avg._Disk_sec/Read"
    dest = "Avg_Disk_sec_perRead"  

  [[processors.rename.replace]]
    field = "Avg._Disk_sec/Write"
    dest = "Avg_Disk_sec_perWrite"

  [[processors.rename.replace]]
    field = "Avg._Disk_sec/Transfer"
    dest = "Avg_Disk_sec_perTransfer"

  [[processors.rename.replace]]
    field = "Avg._sec/Read"
    dest = "Avg_sec_perRead"  

  [[processors.rename.replace]]
    field = "Avg._sec/Write"
    dest = "Avg_sec_perWrite"

Should I close the issue or leave it open as a ref for the "snake_case" fix you mentioned?

@danielnelson
Copy link
Contributor

Let's close this issue, I opened another one for the style change #5196.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Topics for discussion
Projects
None yet
Development

No branches or pull requests

2 participants