Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2017-04-25T18:09:30Z E! ERROR in input [inputs.exec]: Errors encountered: [metric parsing error, reason: [buffer too short], buffer: [], index: [0]] #2718

Closed
KarlWR opened this issue Apr 25, 2017 · 2 comments

Comments

@KarlWR
Copy link

KarlWR commented Apr 25, 2017

Bug report

Script provides correct influxdb output in test mode, but fails with metric parsing error in production.

Relevant telegraf.conf:

# Read metrics from one or more commands that can output to stdout

[[inputs.exec]]

   command = "/etc/telegraf/scripts/tr.sh"
   timeout = "25s"
   data_format = "influx"

System info:

VM - Virtual Box
2 x vCPU
4 GB RAM
All elements run local, telegraf, influxdb, Grafana

[Include Telegraf version, operating system name, and other relevant details]

telegraf -version
Telegraf v1.2.1 (git: release-1.2 3b6ffb3)

centos 7
Linux netmon 3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Steps to reproduce:

  1. enable inputs.exec
  2. exec script
#!/bin/bash
now=$(date +%s%N)
host=netmon
for target in monitor:monitor.xxx-net www:www.xxx.com dc1:dc1.xxx.local youtrack:youtrack.xxx.local kvm-02-gw:kvm-02-gw.xxx.net; do
	IFS=: read target_loc target_ip <<< "$target"
	traceroute -n -I -q 1 $target_ip | awk '$1 != "traceroute" && $2 != "*" { print "traceroute,target_loc='$target_loc',target_ip='$target_ip',hop_num="$1",hop_host="$2",host='$host' resp_time="$3" '$now'" }' &
done
wait

Expected behavior:

telegraf -test -input-filter=exec -debug
2017/04/25 14:21:25 I! Using config file: /etc/telegraf/telegraf.conf

  • Plugin: inputs.exec, Collection 1

traceroute,target_loc=dc1,target_ip=dc1.xxx.local,hop_num=1,hop_host=10.25.0.1,host=netmon resp_time=6.876 1493144485000000000
traceroute,target_loc=dc1,target_ip=dc1.xxx.local,hop_num=2,hop_host=192.168.1.20,host=netmon resp_time=5.888 1493144485000000000
traceroute,host=netmon,target_loc=monitor,target_ip=monitor.xxx-net,hop_num=1,hop_host=10.25.0.1 resp_time=3.36 1493144485000000000
traceroute,target_loc=monitor,target_ip=monitor.xxx-net,hop_num=2,hop_host=192.168.3.25,host=netmon resp_time=63.584 1493144485000000000
traceroute,target_loc=monitor,target_ip=monitor.xxx-net,hop_num=3,hop_host=192.168.3.3,host=netmon resp_time=64.016 1493144485000000000
traceroute,target_loc=kvm-02-gw,target_ip=kvm-02-gw.xxx.net,hop_num=1,hop_host=10.25.166.191,host=netmon resp_time=3006.348 1493144485000000000
traceroute,hop_host=192.168.122.1,host=netmon,target_loc=youtrack,target_ip=youtrack.xxx.local,hop_num=1 resp_time=3006.358 1493144485000000000
traceroute,target_loc=www,target_ip=www.xxx.com,hop_num=1,hop_host=10.25.0.1,host=netmon resp_time=9.867 1493144485000000000
traceroute,host=netmon,target_loc=www,target_ip=www.xxx.com,hop_num=2,hop_host=207.107.114.1 resp_time=5.095 1493144485000000000
traceroute,target_loc=www,target_ip=www.xxx.com,hop_num=3,hop_host=24.156.146.241,host=netmon resp_time=9.814 1493144485000000000
traceroute,target_ip=www.xxx.com,hop_num=4,hop_host=24.156.144.50,host=netmon,target_loc=www resp_time=9.812 1493144485000000000
traceroute,target_loc=www,target_ip=www.xxx.com,hop_num=5,hop_host=209.148.229.225,host=netmon resp_time=17.054 1493144485000000000
traceroute,target_loc=www,target_ip=www.xxx.com,hop_num=6,hop_host=209.148.229.222,host=netmon resp_time=10.527 1493144485000000000
traceroute,target_loc=www,target_ip=www.xxx.com,hop_num=7,hop_host=128.241.3.189,host=netmon resp_time=20.447 1493144485000000000
traceroute,target_loc=www,target_ip=www.xxx.com,hop_num=8,hop_host=129.250.3.5,host=netmon resp_time=33.423 1493144485000000000
traceroute,hop_host=129.250.4.152,host=netmon,target_loc=www,target_ip=www.xxx.com,hop_num=9 resp_time=50.731 1493144485000000000
traceroute,host=netmon,target_loc=www,target_ip=www.xxx.com,hop_num=10,hop_host=129.250.5.24 resp_time=50.728 1493144485000000000
traceroute,host=netmon,target_loc=www,target_ip=www.xxx.com,hop_num=11,hop_host=129.250.2.164 resp_time=50.272 1493144485000000000
traceroute,target_loc=www,target_ip=www.xxx.com,hop_num=12,hop_host=129.250.207.118,host=netmon resp_time=50.872 1493144485000000000
traceroute,host=netmon,target_loc=www,target_ip=www.xxx.com,hop_num=14,hop_host=74.205.108.117 resp_time=61.503 1493144485000000000
traceroute,hop_host=74.205.108.51,host=netmon,target_loc=www,target_ip=www.xxx.com,hop_num=15 resp_time=63.233 1493144485000000000
traceroute,hop_host=98.129.84.207,host=netmon,target_loc=www,target_ip=www.xxx.com,hop_num=16 resp_time=49.43 1493144485000000000
traceroute,hop_host=23.253.92.79,host=netmon,target_loc=www,target_ip=www.xxx.com,hop_num=17 resp_time=65.824 1493144485000000000

Actual behavior:

2017-04-25T18:22:30Z E! ERROR in input [inputs.exec]: Errors encountered: [metric parsing error, reason: [buffer too short], buffer: [], index: [0]]

Additional info:

I can run the script independently, and use the output to INSERT into influxdb without issue, same with the output from telegraf -test -input-filter=exec -debug
However, when restarting the service via systemctl the error found in Actual behaviour is found in the log files and no entries into influxdb occur.

[Include gist of relevant config, logs, etc.]

@danielnelson
Copy link
Contributor

I think this is a permission issue, when run as a service it runs as the user telegraf. You might need to use sudo to run the script.

In the next version, the warning does not occur anymore due to #2448. You might want to make sure your script returns non-zero in case it fails.

Let me know if the permission change solves it.

@KarlWR
Copy link
Author

KarlWR commented Apr 27, 2017

@danielnelson good call.
The script is not the perm issue but the privileged call to traceroute within causes the issue. Some doc / logging would help here.

I will add the non-zero return to make it more robust too.

I solved by adding telegraf to sudoers. For future troubleshoot test for perm issues with:
sudo -u telegraf <user_script>.sh
all is revealed

Thanks for help.
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants