Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use outputs.file plugin write data to virtio-serial fail #4223

Closed
pytimer opened this issue Jun 1, 2018 · 3 comments
Closed

use outputs.file plugin write data to virtio-serial fail #4223

pytimer opened this issue Jun 1, 2018 · 3 comments

Comments

@pytimer
Copy link
Contributor

pytimer commented Jun 1, 2018

Relevant telegraf.conf:

Linux:

[[outputs.file]]
  files = ["/dev/virtio-ports/org.qemu.guest_agent.0"]

Windows:

[[outputs.file]]
  files = ["\\\\.\\Global\\org.qemu.guest_agent.0"]

System info:

[Include Telegraf version, operating system name, and other relevant details]

Telegraf v1.6.2 (git: release-1.6 1fb4283)

virtual machine os:

  • Windows: windows server 2008r2 windows server 2012r2
  • Linux: centos7.2

virtual machine host: CentOS Linux release 7.2.1511 (Core)

Steps to reproduce:

  1. use libvirt create vm and open qemu guest agent virtio-serial port
  2. install telegraf in the vm
  3. run socat - UNIX-CONNECT:/var/lib/libvirt/qemu/org.qemu.guest_agent.0.<uuid>.sock command, listen socket
  4. use above telegraf config and start it, socat can get metrics from this socket
  5. Ctrl+c stop socat command, telegraf still running
  6. run socat command again, it can not get any metrics.

Expected behavior:

When run socat command again, socat can get metrics from virtio-serial socket.

Actual behavior:

socat not get any metrics.

Additional info:

Telegraf linux logs:

2018-06-01T10:33:45Z I! Tags enabled: host=localhost
2018-06-01T10:33:45Z I! Agent Config: Interval:5s, Quiet:false, Hostname:"localhost", Flush Interval:5s 
2018-06-01T10:33:55Z D! Output [file] buffer fullness: 200 / 1000 metrics. 
2018-06-01T10:33:55Z D! Output [file] wrote batch of 232 metrics in 23.4375ms
2018-06-01T10:34:00Z D! Output [file] buffer fullness: 130 / 1000 metrics. 
2018-06-01T10:34:05Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:10Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:15Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:20Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:25Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:30Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:35Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:40Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:45Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:35:00Z E! Error in plugin [inputs.procstat]: took longer to collect than collection interval (5s)
2018-06-01T10:35:05Z E! Error in plugin [inputs.procstat]: took longer to collect than collection interval (5s)
2018-06-01T10:35:10Z E! Error in plugin [inputs.procstat]: took longer to collect than collection interval (5s)

Telegraf windows logs:

2018-06-01T10:33:45Z I! Tags enabled: host=WIN-TEST
2018-06-01T10:33:45Z I! Agent Config: Interval:5s, Quiet:false, Hostname:"WIN-TEST", Flush Interval:5s 
2018-06-01T10:33:55Z D! Output [file] buffer fullness: 232 / 1000 metrics. 
2018-06-01T10:33:55Z D! Output [file] wrote batch of 232 metrics in 23.4375ms
2018-06-01T10:34:00Z D! Output [file] buffer fullness: 116 / 1000 metrics. 
2018-06-01T10:34:05Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:10Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:15Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:20Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:25Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:30Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:35Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:40Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:34:45Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-01T10:35:00Z E! Error in plugin [inputs.procstat]: took longer to collect than collection interval (5s)
2018-06-01T10:35:05Z E! Error in plugin [inputs.procstat]: took longer to collect than collection interval (5s)
2018-06-01T10:35:10Z E! Error in plugin [inputs.procstat]: took longer to collect than collection interval (5s)

If socat command running before telegraf running, it's ok! But if not, it's wrong.

I don't know if use telegraf outputs.file write data to virtio-serial correct? if it not right, which output plugin can use in my case. Someone can help me? thanks.

@danielnelson
Copy link
Contributor

I've never used virtio-serial, but it looks like it might be challenging to get working reliably. It appears that writes block if no one is listening, and socat UNIX-CONNECT says:

if is a UNIX domain socket, but no process is listening, this is an error.

So it seems like you would need to have socat on a loop in the host and if it went down Telegraf flush would block, which will cause a lot of errors due to #2919

For reconnections, I think you might need to make sure socat is not closing the socket on shutdown. Try this (just based on reading the manpage):

socat - UNIX-CONNECT:/var/lib/libvirt/qemu/org.qemu.guest_agent.0.<uuid>.sock,shut-none

@pytimer
Copy link
Contributor Author

pytimer commented Jun 4, 2018

Sorry, late reply.

I use socat - UNIX-CONNECT:/var/lib/libvirt/qemu/org.qemu.guest_agent.0.<uuid>.sock,shut-none command to test, but it also have the some problem.

I just use socat testing. In the real case, a program will replace socat. But if this program close, telegraf will occurs the above errors.

And if program stop long time, i start program and restart telegraf on the Windows later, logs :

2018-06-04T01:41:15Z E! Error writing to output [file]: failed to write message: <metrics>, write \\.\Global\org.qemu.guest_agent.0: Used to indicate that an operation cannot continue without blocking for I/O.
2018-06-04T01:41:20Z D! Output [file] buffer fullness: 1116 / 1000 metrics. 
2018-06-04T01:41:25Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-04T01:41:30Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-04T01:41:35Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-04T01:41:40Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-04T01:41:45Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-04T01:41:50Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-04T01:41:55Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-04T01:42:00Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-04T01:42:15Z E! Error in plugin [inputs.svcstat]: took longer to collect than collection interval (5s)
2018-06-04T01:42:20Z E! Error in plugin [inputs.svcstat]: took longer to collect than collection interval (5s)

I don't know more information about virtio-serial.

@pytimer
Copy link
Contributor Author

pytimer commented Jun 4, 2018

Hi, @danielnelson

I found error is not caused by telegraf.

On my host, there are two vm, libvirt virtio-serial name both org.qemu.guest_agent.0. I change one vm virtio-serial name org.qemu.guest_agent.1, different from the other. When i reproduce, no errors.

Thanks for your help. This issue can close.

@pytimer pytimer closed this as completed Jun 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants