-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve file descriptor closing loops #14213
Conversation
22e345d
to
a088df2
Compare
We can't do that until a new stable version is released. |
That's correct, that's why I didn't include it in this PR, but added it as TODO so that we don't forget about it. |
Various operating systems have different methods for massively closing file descriptors.
I think the So, I think the process should be like this:
To use By adding this line to
We are going to have in /* Define to 1 if you have the `closefrom' function. */
#define HAVE_CLOSEFROM 1
/* Define to 1 if you have the `close_range' function. */
#define HAVE_CLOSE_RANGE 1 After this, we can do: #if defined(HAVE_CLOSE_RANGE)
// use close_range() to close the open file descriptors
#elif defined(HAVE_CLOSEFROM)
// use closefrom() to close the open file descriptors
#else
// try to use /proc/PID/fd to close the open file descriptors
// if the above fails, try to use getrlimit()
// if the above fails, fallback to sysconf(_SC_OPEN_MAX)
#endif |
I see in the code that there are 2 possible actions: close and mark with I think the later in Generally speaking, doing this loop at If we can't mark them like that when they are opened, I suggest to drop the marking entirely and once to fork is done by This means that we are going to need only one action for |
@Dim-P please also use |
654d3bd
to
527e8e3
Compare
Thank you for the comments @ktsaou .
Tested on |
FWIW, I think we are overthinking this. The original reason to close open FDs was lxc-attach, see #1775.
Skimming through the code, this is (or should be) actually 2 places:
We should coalesce 1 and 2. That is, we should spawn the spawn server only after closing all open FDs.
The original issue was lxc-attach leaving an open pseudoterminal device open. I'd bet that closing just a small, fixed amount of open FDs (let's say 64-128) would be more than safe. We can handle then the opening of our own FDs with the proper flag to close on exec. |
527e8e3
to
c7ba48a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed internally, I am approving this PR expecting that suggestions from @vkalintiris will be address in another PR. LGTM!
Thanks guys, @vkalintiris I will merge this now to fix the issue before the upcoming stable release and we can make a new PR to improve the code using your suggestions. |
Fixes the issue introduced as a result of #14213, where the agent fails to build successfully on FreeBSD < 13.1 and on environments with Linux kernel version < 5.11, due to missing 'CLOSE_RANGE_CLOEXEC' .
Fixes the issue introduced as a result of #14213, where the agent fails to build successfully on FreeBSD < 13.1 and on environments with Linux kernel version < 5.11, due to missing 'CLOSE_RANGE_CLOEXEC' .
Summary
Fixes #14177 , fixes #14062 .
There are 3 places in the agent where we either try to close all open file descriptors or mark them to be closed by the
exec()
stage ofposix_spawn()
. This is done in a bruteforce way, by looping through all FD IDs up to_SC_OPEN_MAX
.While this is a valid approach, it creates issues on platforms that (incorrectly) set the maximum FD limit to infinity (in practice, it can be as high as
1073741816
).This PR instead of bruteforcing its way through all possible FD IDs, it reads
/proc/self/fd
(if available) to discover which FDs are actually open and closes (or marks to be closed) only them (unlessSTDIN
,STDOUT
and/orSTDERR
are to be excluded).This should also speed up slightly the agent initialisation time (by how much, it depends on the
_SC_OPEN_MAX
value) .Test Plan
Test on a system with a very high
ulimit -n
such as1073741816
. The agent should start up in reasonable time instead of taking >10min and consuming 100% CPU in the meantime.TODO: Revert #14178