Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http_listener input can leak sockets #2923

Closed
rconn01 opened this issue Jun 14, 2017 · 5 comments
Closed

Http_listener input can leak sockets #2923

rconn01 opened this issue Jun 14, 2017 · 5 comments
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@rconn01
Copy link

rconn01 commented Jun 14, 2017

Bug report

This is somewhat related to #2919. When the http_listener input gets blocked waiting on the output channels it seems to leak sockets.

Ultimately we are left with this in the logs

I! http: Accept error: accept tcp [::]:8186: accept: too many open files; retrying in 1s

This may be the same root cause as the issue above, but I am reporting separately as it seems that the http listener should do a better job of closing its sockets, unrelated to the blocking issues documented in the other issue

Relevant telegraf.conf:

Any config that enables the http_listener. See #2919 for an example

System info:

telegraf version 1.3.1

Steps to reproduce:

  1. Cause the telegraf inputs to be blocked as described in Outputs block inputs when batch size is reached #2919
  2. Post with a client repeatedly to to the write end point and timeout from the client before you get a response
  3. After sometime run a lsof and you should see many sockets in a CLOSE_WAIT state

Expected behavior:

The listener should do a better job of closing its sockets that the client walks away from, this way it doesn't end up running out of file handles

Actual behavior:

Additional info:

Here is the output from lsof with a http_listener running on port 8186

lsof -iTCP:8186
COMMAND    PID  USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
telegraf 50277 connr    6u  IPv6 0x53cd5c33e06d8623      0t0  TCP *:8186 (LISTEN)
telegraf 50277 connr    8u  IPv6 0x53cd5c33e0bfa523      0t0  TCP localhost:8186->localhost:50437 (CLOSE_WAIT)
telegraf 50277 connr   10u  IPv6 0x53cd5c33bd92ee23      0t0  TCP localhost:8186->localhost:50662 (CLOSE_WAIT)
telegraf 50277 connr   11u  IPv6 0x53cd5c33e005fa23      0t0  TCP localhost:8186->localhost:50413 (CLOSE_WAIT)
telegraf 50277 connr   12u  IPv6 0x53cd5c33e066d123      0t0  TCP localhost:8186->localhost:50031 (CLOSE_WAIT)
telegraf 50277 connr   13u  IPv6 0x53cd5c33e005d723      0t0  TCP localhost:8186->localhost:50490 (CLOSE_WAIT)
telegraf 50277 connr   14u  IPv6 0x53cd5c33e06d5923      0t0  TCP localhost:8186->localhost:50037 (CLOSE_WAIT)
telegraf 50277 connr   15u  IPv6 0x53cd5c33e0a6f923      0t0  TCP localhost:8186->localhost:50072 (CLOSE_WAIT)
telegraf 50277 connr   16u  IPv6 0x53cd5c33e0e7bc23      0t0  TCP localhost:8186->localhost:50154 (CLOSE_WAIT)
telegraf 50277 connr   17u  IPv6 0x53cd5c33e005eb23      0t0  TCP localhost:8186->localhost:50348 (CLOSE_WAIT)
telegraf 50277 connr   18u  IPv6 0x53cd5c33dc533923      0t0  TCP localhost:8186->localhost:50615 (CLOSE_WAIT)
telegraf 50277 connr   19u  IPv6 0x53cd5c33dade8423      0t0  TCP localhost:8186->localhost:50643 (CLOSE_WAIT)
telegraf 50277 connr   21u  IPv6 0x53cd5c33e0bc2123      0t0  TCP localhost:8186->localhost:50647 (CLOSE_WAIT)
telegraf 50277 connr   22u  IPv6 0x53cd5c33e0b04323      0t0  TCP localhost:8186->localhost:50821 (CLOSE_WAIT)
telegraf 50277 connr   23u  IPv6 0x53cd5c33dc621423      0t0  TCP localhost:8186->localhost:50917 (CLOSE_WAIT)
telegraf 50277 connr   24u  IPv6 0x53cd5c33e066b323      0t0  TCP localhost:8186->localhost:49986 (CLOSE_WAIT)
telegraf 50277 connr   25u  IPv6 0x53cd5c33e0a73523      0t0  TCP localhost:8186->localhost:49994 (CLOSE_WAIT)
telegraf 50277 connr   26u  IPv6 0x53cd5c33e0a72123      0t0  TCP localhost:8186->localhost:50055 (CLOSE_WAIT)
telegraf 50277 connr   27u  IPv6 0x53cd5c33e0b9ff23      0t0  TCP localhost:8186->localhost:50102 (CLOSE_WAIT)
telegraf 50277 connr   28u  IPv6 0x53cd5c33e066ea23      0t0  TCP localhost:8186->localhost:50087 (CLOSE_WAIT)
telegraf 50277 connr   29u  IPv6 0x53cd5c33e0b9e623      0t0  TCP localhost:8186->localhost:50354 (CLOSE_WAIT)
telegraf 50277 connr   30u  IPv6 0x53cd5c33e0bf8723      0t0  TCP localhost:8186->localhost:50130 (CLOSE_WAIT)
telegraf 50277 connr   31u  IPv6 0x53cd5c33e0b9f023      0t0  TCP localhost:8186->localhost:50181 (CLOSE_WAIT)
telegraf 50277 connr   32u  IPv6 0x53cd5c33e06d9023      0t0  TCP localhost:8186->localhost:50205 (CLOSE_WAIT)
telegraf 50277 connr   33u  IPv6 0x53cd5c33e0931723      0t0  TCP localhost:8186->localhost:50234 (CLOSE_WAIT)
telegraf 50277 connr   34u  IPv6 0x53cd5c33e0bc1723      0t0  TCP localhost:8186->localhost:50250 (CLOSE_WAIT)
telegraf 50277 connr   35u  IPv6 0x53cd5c33e0932123      0t0  TCP localhost:8186->localhost:50266 (CLOSE_WAIT)
telegraf 50277 connr   36u  IPv6 0x53cd5c33e0bf7323      0t0  TCP localhost:8186->localhost:50302 (CLOSE_WAIT)
telegraf 50277 connr   37u  IPv6 0x53cd5c33e0c45f23      0t0  TCP localhost:8186->localhost:50450 (CLOSE_WAIT)
telegraf 50277 connr   38u  IPv6 0x53cd5c33e005c323      0t0  TCP localhost:8186->localhost:50370 (CLOSE_WAIT)
telegraf 50277 connr   39u  IPv6 0x53cd5c33e0933f23      0t0  TCP localhost:8186->localhost:50388 (CLOSE_WAIT)
telegraf 50277 connr   40u  IPv6 0x53cd5c33e005b923      0t0  TCP localhost:8186->localhost:50405 (CLOSE_WAIT)
telegraf 50277 connr   41u  IPv6 0x53cd5c33dbcada23      0t0  TCP localhost:8186->localhost:50544 (CLOSE_WAIT)
telegraf 50277 connr   42u  IPv6 0x53cd5c33e092f923      0t0  TCP localhost:8186->localhost:50466 (CLOSE_WAIT)
telegraf 50277 connr   43u  IPv6 0x53cd5c33e005cd23      0t0  TCP localhost:8186->localhost:50514 (CLOSE_WAIT)
telegraf 50277 connr   44u  IPv6 0x53cd5c33e0933523      0t0  TCP localhost:8186->localhost:50526 (CLOSE_WAIT)
telegraf 50277 connr   45u  IPv6 0x53cd5c33dbca9e23      0t0  TCP localhost:8186->localhost:50557 (CLOSE_WAIT)
telegraf 50277 connr   46u  IPv6 0x53cd5c33e0bc2b23      0t0  TCP localhost:8186->localhost:50619 (CLOSE_WAIT)
telegraf 50277 connr   47u  IPv6 0x53cd5c33e06d7223      0t0  TCP localhost:8186->localhost:50686 (CLOSE_WAIT)
telegraf 50277 connr   48u  IPv6 0x53cd5c33dbdd8023      0t0  TCP localhost:8186->localhost:50700 (CLOSE_WAIT)
telegraf 50277 connr   49u  IPv6 0x53cd5c33dbf27e23      0t0  TCP localhost:8186->localhost:50715 (CLOSE_WAIT)
telegraf 50277 connr   50u  IPv6 0x53cd5c33d6bd8a23      0t0  TCP localhost:8186->localhost:50757 (CLOSE_WAIT)
@danielnelson
Copy link
Contributor

Also see the stack trace on #2914.

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Aug 19, 2017
@rconn01
Copy link
Author

rconn01 commented Mar 8, 2018

Any thoughts on how this might be fixed. The open sockets leads to the service running out of file handles and requires a restart of telegraf

@danielnelson
Copy link
Contributor

Comprehensive fix will be time consuming due to the reasons mentioned in #2914.

Perhaps we could add methods to the Accumulator like: AddFieldsContext to timeout the channel add, and if it times out return an error.

@jayshah11
Copy link

Has there been any updates on this issue? I am experiencing the same.

@danielnelson
Copy link
Contributor

This should actually have been fixed in 1.9.0. If you have a later version of Telegraf than this, can you double check that the reproduction steps above reproduce the error.

@danielnelson danielnelson added this to the 1.9.0 milestone Jun 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants