-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(http): Stop plugins from leaking file descriptors on telegraf reload #15213
Conversation
71b4c9e
to
e3fa4a9
Compare
Honestly, I don't like this PR very much, but it does solve the fd leaks. I feel like the common HTTP client could be reworked so the teardown step is umissable - a few ways to do that. It would also be better if I didn't need all the dummy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your fix @nick-kentik! Some comments in the code... I generally like the changes in the code (even though you seem to dislike them ;-)). We could think of adding a idle-timeout setting to the common HTTP client and use it (or disable it) where applicable...
e3fa4a9
to
fbe380f
Compare
Thanks @srebhan , suggestions applied. I don't particularly like this solution, but I like the file descriptor leaks much less 😅. Plenty of time to improve things once the bugs are squashed. |
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nick-kentik! Feel free to submit with a follow up PR to add a config option with a reasonable (long) default for the idle-timeout!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. We had internally talked about this yesterday and agreed that this was the way forward as well. So the timing of your PR was perfect.
Summary
On reload, telegraf rebuilds its plugins, which means creating a new
*http.Client
(and*http.Transport
) in each of the files touched in this PR. In general, transports should be long-lived; they carry state and possibly idle HTTP connections.Without explicitly calling
CloseIdleConnections()
, in cases where noidle_conn_timeout
is set, we leave it up to the go runtime to decide when to drop the connections - and it can easily decide "never", leading to open connections accumulating over time, especially with repeated reloads of the telegraf config in a single process, and eventually telegraf failing withEMFILE
problems.Fix this by telling the Go runtime (by calling
CloseIdleConnections()
) when we're done with a given transport.Checklist
Related issues
resolves #15177