Use vfork() improve performance when starting processes. #33289

joshudson · 2018-11-07T02:33:51Z

Use vfork() to start child processes where this yields a performance improvement due to getting rid of page faults.

The larger the host process, the bigger the improvement. For a one gigabyte process, vfork() is literally 150 times faster than fork(); however most of the performance penalty is incorrectly being charged to the garbage collector.

…improvement due to getting rid of page faults.

joshudson · 2018-11-07T02:36:04Z

Benchmark code written specifically to demonstrate vfork()'s performance superiority:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <sys/wait.h>

int main(void)
{
        char *args[] = { "/bin/false", NULL };
        volatile char *buffer = malloc(1024 * 1024 * 1024);
        for (int i = 0; i < 1024 * 1024 * 1024; i += 4096)
                buffer[i] = 1;
        time_t start = time(NULL);
        pid_t pid;
        int status;
        for (int j = 0; j < 1000; j++) {
#ifndef DUMMY
                if ((pid = FORK()) == 0) {
                        execve(args[0], args, NULL);
                        write(2, "Oops\n", 5);
                        _exit(3);
                }
                if (pid < 0)
                        _exit(3);
                waitpid(pid, &status, 0);
                if ((status & 0xFF) > 1) _exit(1);
#endif
#ifndef NOPAGEFAULT
                for (int i = 0; i < 1024 * 1024 * 1024; i += 4096)
                        buffer[i] = 2;
#endif
        }
        printf("%d\n", (int)(time(NULL) - start));
        exit(0);
}

joshudson · 2018-11-07T03:45:41Z

Gaaa; 3 builds failed because they're cross compiles which is literally documented as not supported.

src/Native/Unix/System.Native/pal_process.c

tmds · 2018-11-07T09:04:22Z

src/Native/Unix/System.Native/pal_process.c

+    // ptrace() is used on the child, thus making setuid() safe to use after vfork(). The fabled vfork()
+    // security hole is the other way around; if a multithreaded host were to execute setuid()
+    // on another thread while a vfork() child is still pending, bad things are possible; however we
+    // do not do that.


Is this important to mention? Can you phrase it as something we mustn't do?

tmds · 2018-11-07T09:12:03Z

I did a review and my comments were mostly about the comments.

For a one gigabyte process, vfork() is literally 150 times faster than fork();

I think it would be interesting to see a .NET Code benchmark with these changes, using Process.Start API.

Does OSX HAVE_VFORK_SHM?

stephentoub · 2018-11-07T12:21:03Z

I'm concerned by some of the man page comments.

The requirements put on vfork() by the standards are weaker than
       those put on fork(2), so an implementation where the two are
       synonymous is compliant.  In particular, the programmer cannot rely
       on the parent remaining blocked until the child either terminates or
       calls execve(2), and cannot rely on any specific behavior with
       respect to shared memory.

and

When vfork() is called in a multithreaded process, only the calling
       thread is suspended until the child terminates or executes a new
       program.  This means that the child is sharing an address space with
       other running code.  This can be dangerous if another thread in the
       parent process changes credentials (using setuid(2) or similar),
       since there are now two processes with different privilege levels
       running in the same address space.  As an example of the dangers,
       suppose that a multithreaded program running as root creates a child
       using vfork().  After the vfork(), a thread in the parent process
       drops the process to an unprivileged user in order to run some
       untrusted code (e.g., perhaps via plug-in opened with dlopen(3)).  In
       this case, attacks are possible where the parent process uses mmap(2)
       to map in code that will be executed by the privileged child process.

Seems dangerous to rely on this purely by checking whether the function exists.

tmds · 2018-11-07T12:52:56Z

I'm concerned by some of the man page comments.

+1 vfork semantics are not portable. I think it will work fine on Linux and BSDs when implemented 'correctly'.
It may be hard to figure out what is correct. For example, the go implementation makes the parent function return immediately after the vfork call: https://github.com/golang/go/blob/50bd1c4d4eb4fac8ddeb5f063c099daccfb71b26/src/syscall/exec_linux.go#L162-L167. We're not doing that, maybe we should?

The C example showing the 150x improvement is a worst-case scenario where the parent starts writing to 1GB of data (at 1 byte per page) just after the fork.
In the .NET implementation the parent thread is waiting for the child to exec (so it can't write). While other threads may be writing, probably they are doing it sequentially, limiting the number of pages to be copied.
So the gain in .NET will be much lower, and maybe not worth the complexity and non-portability issues.

bartonjs · 2018-11-07T14:23:01Z

The behavior of calling vfork is not defined if setuid is used, which we use if the process is being launched with explicit credentials. It's probably also not defined if change working directory is set.

Given that vfork self-describes these cautions, I'd want to see it not used during the cred-set or CWD-set paths.

My gut feel is that it's just inherently more risk than reward.

joshudson · 2018-11-07T14:29:32Z

chdir is defined for the same reason dup2 is.

I deliberately did not check for the existence of vfork() but put a cmake check for the behavior itself. I am now debating ripping out the specific dependency because cross compile.

ptrace() makes a security demand against the parent process is why setuid() is safe.

joshudson · 2018-11-07T14:31:00Z

The equivalent worst case is guaranteed to happen in .NET sooner or later: pack the heap right after process start.

stephentoub · 2018-11-07T14:31:38Z

My gut feel is that it's just inherently more risk than reward.

+1

@joshudson, what impact does this have on a "normal" .NET process using Process.Start? For example, does this make a measurable improvement to the dotnet command when it spawns processes?

joshudson · 2018-11-07T15:17:01Z

@bartonjs: Any platform in which calling setuid in the vfork child is a security vulnerability is already unsafe because calling setuid in the fork child is an information disclosure vulnerability.

The vfork+setuid referenced in the man page I already called out. Don't ever call setuid in the parent while a child is in vfork; which we don't do.

…les work again.

stephentoub · 2018-12-13T02:01:34Z

@jkotas, @janvorli, do you have an opinion on this one?

jkotas · 2018-12-13T04:58:46Z

There are many articles that warn about vfork security and portability issues. From https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152373:

Do not use vfork()

I agree that checking for vfork presence is not enough given this. I think vfork would be ok to use only on platforms or situations where vfork does not suffer from these ugly issues. I do not know whether such cases exist.

joshudson · 2018-12-13T14:23:11Z

That article's just wrong. It lists the original call chain vfork was added for as undefined.

Incidentally I found out why Go arranged to have the vfork parent and child in different functions. Go has a M:N threading model and implicit async.

stephentoub · 2018-12-13T14:31:11Z

@joshudson, can you point to any official documentation / man pages / etc. for key platforms that specifically states vfork as being safe / recommended? At the moment, the potential risk does not seem to be worth the potential benefit, which I expect to be limited in the common case, as @tmds outlined.

joshudson · 2018-12-13T15:23:12Z

https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234

Quite a surprising document really. He starts talking about adding another system call about half way through but he starts off with the address space problem.

https://sourceware.org/bugzilla/show_bug.cgi?id=10354

Glibc switching to vfork.

https://gitlab.gnome.org/GNOME/glib/merge_requests/95

Gnome switching to vfork via posix_spawn

http://nommu.org

I don't know if you will ever support such processors but sometimes you don't get fork at all. It's vfork or don't compile.

janvorli · 2018-12-13T16:51:51Z

@joshudson thank you for the links to the articles. I have no prior knowledge of vfork, so I have to catch up and also read the articles. But one thing that seems to be needed if we wanted to add the vfork is to make sure no signal handlers can be called while running in the child before the exec, as described by Rich Felker in https://sourceware.org/bugzilla/show_bug.cgi?id=14750.

@stephentoub have we ever considered using posix_spawn instead of fork / exec? It seems that if that was possible, we would get the same benefit as with using vfork, taking an advantage of the fact that GLIBC / musl developers have already done the necessary steps to ensure it is safe.

joshudson · 2018-12-13T17:14:15Z

You can't use posix_spawn because chdir. Show me where you set signal handlers and I'll deal with that. (It's almost certaibly broken now because the fork() child rarely can inherit signal handlers either).

janvorli · 2018-12-13T17:57:39Z

Show me where you set signal handlers

We deal with signal handlers in coreclr PAL in https://github.com/dotnet/coreclr/blob/master/src/pal/src/exception/signal.cpp and also here in corefx in https://github.com/dotnet/corefx/blob/master/src/Native/Unix/System.Native/pal_signal.c
But it is possible that 3rd party libraries that .NET application use also register their own signal handlers, so I am not sure why you wanted to know where we deal with signals. My understanding (based on the web page I've linked above) was that all signals must be blocked before calling vfork, then in child set all signals that are not set to SIG_IGN to SIG_DFL and then restore the signal mask before calling the execve and also in the parent. I understand all as really all, no matter which ones we set and don't set.

joshudson · 2018-12-13T18:08:06Z

I read your signal handlers. They're not safe in fork children either, as expected. On considering third party signal handlers for the first time, looks like blanket mask is correct whether fork or vfork is used.

stephentoub · 2018-12-13T22:51:12Z

They're not safe in fork children either, as expected.

Can you elaborate? What's broken?

joshudson · 2018-12-13T22:59:40Z

If the fork child we're to receive a handled signal, it would write to the controlling pipe just like the parent does, but the other end is not expecting a spurious pipe read.

Pretty much any signal handlers that doesn't terminate the process and does more than write to a global variable isn't safe in a fork child so I'm not surprised.

I was expecting to find only SIGSEGV and SIGBUS -> (if managed block throw exception else die) which would have been safe even for vfork.

…nt; also took care of pthread cancellation mask in case third-party native code tries to use it

joshudson · 2018-12-14T04:50:09Z

There's the signal mask, the signal handling cleaning, and the pthread_cancel mask for good measure.

Portability is a harsh mistress.

(Once considering third party libraries trying to use signals, this stuff is clearly broken even before trying to use vfork()).

janvorli · 2018-12-14T08:32:34Z

src/Native/Unix/System.Native/pal_process.c

-    if ((processId = fork()) == -1)
+    // The fork child must not be signalled until it calls exec(): our signal handlers do not
+    // handle being raised in the child process correctly
+    sigfillset(&signal_set);


A nit - use SIGALL_SET instead of creating your own set

joshua@novaϟ find /usr -name '*.h' -print0 | xargs -0 grep -i SIGALL_SET /dev/null
joshua@novaϟ

SIGALL_SET is not in my system header files.

janvorli · 2018-12-14T08:48:53Z

src/Native/Unix/System.Native/pal_process.c

+            {
+                if (sig != SIGKILL && sig != SIGSTOP)
+                    break; // No more signals
+            } else {


A code style nit - the curly braces should be on separate lines.

janvorli · 2018-12-14T09:08:40Z

src/Native/Unix/System.Native/pal_process.c

+                    : (sa_old.sa_handler == SIG_IGN || sa_old.sa_handler == SIG_DFL))
+                {
+                    // It has a pre-defined handler -- put it back
+                    sigaction(sig, &sa_old, &sa_trash);


Most of the signals will fall into this category. So it would be better to query the previous handler fist and then only set it for signals that don't have it set to SIG_IGN or SIG_DFL, instead of setting it and then resetting back.

I tried and failed to locate a portable API call for reading the signal handler w/o setting it.

When you pass NULL as the act parameter and non-NULL as the oldact, it just gets the current value.
sigaction Linux man page says:

If act is non-NULL, the new action for signal signum is installed
from act. If oldact is non-NULL, the previous action is saved in
oldact.

FreeBSD man page says:

If act is non-NULL, it specifies an action ...

OSX man page says:

If act is non-zero, it specifies an action ...

src/Native/Unix/System.Native/pal_process.c

janvorli · 2019-01-25T17:25:42Z

@stephentoub thank you for reminding me that. I have not looked into that yet.

izbyshev · 2019-01-27T14:48:26Z

src/Native/Unix/System.Native/pal_process.c

+            }
+            if (sigaction(sig, NULL, &sa_old))
+            {
+                break; // No more signals


Breaking here is incorrect because sigaction in glibc returns -1 for signals used by it internally. Since in practice those signals are 32/33 and the total number of signals is >=64, the loop will stop early.

Well that's undocumented nonsense. I'll just go use NSIG then.

It's documented: http://man7.org/linux/man-pages/man2/sigaction.2.html (section C library/kernel differences).

I really don't want to cause any controversy on this; but somehow my man page on sigaction has about half the content. I guess they fixed their docs.

izbyshev · 2019-01-27T14:57:59Z

The WSL bug is so bad that we should sit on this until the fix makes it to the main release. We are not fine as the bug randomly corrupts memory because it doesn't block but also doesn't replace with fork.

The WSL bug doesn't appear to be so bad. Simple testing on 1709 indicates that vfork behaves just like fork:

$ cat test.c
#include <stdio.h>
#include <unistd.h>

int main(void) {
    volatile int x = 0;
    if (vfork() == 0) {
        sleep(1);
        write(1, "child\n", 6);
        x = 42;
        _exit(0);
    }
    printf("parent\n");
    sleep(2);
    printf("%d\n", x);
    return 0;
}
$ gcc test.c
$ ./a.out
parent
child
0

0 indicates that address space is not actually shared. On 1803, where vfork was fixed, the output is correct:

child
parent
42

joshudson · 2019-01-28T03:19:17Z

@izbyshev : OK then. The documentation said it was worse.

…he middle of the signal list

janvorli · 2019-02-01T00:40:20Z

@stephentoub so I did experiments with vfork / fork. Here is what my test app does:

mmap certain amount of memory and writes something to every page.
call fork or vfork
in the child path, call execve that just calls ls for simplicity
in the parent path, wait for the child to exit and then exit too

I ran the tests on my native Linux box with 24GB of RAM and swap enabled / disabled. I've tried to increase the amount of memory the test was eating until the ~~execve~~ forking failed.
Here are the results

function	swap	max_mem
fork	on	17GB
vfork	on	24GB
fork	off	4.6GB
vfork	off	9.47GB

As you can see, the vfork have worked fine with much more memory consumed by the parent with both the swap on and off.
That means that if a process that consumes a lot of memory spawns a child process, it has much higher probability of failing with fork than with vfork.

stephentoub · 2019-02-06T18:55:31Z

so I did experiments with vfork / fork

Thanks, @janvorli. And you're comfortable with the change, such that if vfork is available, we use it? i.e. all of the previously raised concerns are no longer applicable on any platform?

joshudson · 2019-02-08T04:48:37Z

On the other hand I was expecting somebody to tell me what particular build magic you wanted to use to exclude Mac (the only suspect vfork() implementation even being considered for .NET support). I went over the modern BSD documents and there's no issue there.

janvorli · 2019-02-08T09:28:47Z

@joshudson #ifdef __APPLE__ is all the magic you need.

joshudson · 2019-02-10T21:22:34Z

Something's up with master builds; looks more like the build chain is broken than anything I did. My machine yields:

/home/joshua/netcore/dotnet/Tools/tests.targets(579,5): error : One or more tests failed while running tests from 'System.Drawing.Common.Tests' please check /home/joshua/netcore/dotnet/bin/tests/System.Drawing.Common.Tests/netcoreapp-Linux-Debug-x64/testResults.xml for details! [/home/joshua/netcore/dotnet/src/System.Drawing.Common/tests/System.Drawing.Common.Tests.csproj]
/home/joshua/netcore/dotnet/Tools/tests.targets(579,5): error : One or more tests failed while running tests from 'System.Net.NameResolution.Pal.Tests' please check /home/joshua/netcore/dotnet/bin/tests/System.Net.NameResolution.Pal.Tests/netcoreapp-Linux-Debug-x64/testResults.xml for details! [/home/joshua/netcore/dotnet/src/System.Net.NameResolution/tests/PalTests/System.Net.NameResolution.Pal.Tests.csproj]
/home/joshua/netcore/dotnet/dir.traversal.targets(77,5): error : (No message specified) [/home/joshua/netcore/dotnet/src/tests.builds]
4 Warning(s)
3 Error(s)

which are the same tests that fail on master for me.

src/Native/Unix/System.Native/pal_process.c

joshudson · 2019-02-12T02:22:53Z

@stephentoub : One last commit to get rid of the remaining typos. I'm a poor speller.

stephentoub · 2019-02-12T08:58:07Z

@janvorli, does this look good to you?

stephentoub · 2019-02-13T22:44:10Z

@dotnet-bot test Packaging All Configurations x64 Debug Build please
@dotnet-bot test UWP CoreCLR x64 Debug Build please

stephentoub · 2019-02-15T16:50:40Z

Thanks, @joshudson. At this point we'll merge it. If we need to revert for some reason, thankfully it's a one character change to delete the 'v' :-)

janvorli · 2019-02-18T14:22:41Z

@janvorli, does this look good to you?

Yes, it does. I am sorry for a late response, I was OOF last week.

Additionally, I was wondering if it would make sense to add an env variable that would enable switching back to fork for users that would not be comfortable with the fact we use vfork for some reason.

stephentoub · 2019-02-18T14:29:21Z

Additionally, I was wondering if it would make sense to add an env variable that would enable switching back to fork for users that would not be comfortable with the fact we use vfork for some reason.

Why would someone be uncomfortable? Your comment makes me worried again we shouldn't be using it.

janvorli · 2019-02-18T14:34:06Z

I am not worried about it, but I have thought some people might be due to the negative articles mentioned in the comments above.

joshudson · 2019-02-18T15:15:24Z

That's a fun tuning knob. Somebody writes this code (2007 me), gets the memory corruption, changes the knob, slows down the GC, and it goes away:

NativeMethods.EnumSomething(myCallback);

In 2007 there was this bug where the callback function went out of scope immediately despite the documentation saying when the Enum function returned. I don't know if it still exists. Things like these can be set up that do easily enough.

In general, expect a few timing issues being reported as vfork problems. Otherwise I'd be meh about

if (0==(pid=((strcmp(getenv("COREFX_USE_FORK")?:"","1")?vfork:fork)())))

vfork is like setjmp; you can't write (condition? vfork ():fork()) so you have to write (condition?vfork:fork)() . See K&R on setjmp for details.

stephentoub · 2019-02-18T15:17:57Z

I am not worried about it, but I have thought some people might be due to the negative articles mentioned in the comments above.

I think we should either be confident enough in the change to be able to explain to everyone why it's safe, or we should revert it. Having a knob for this seems wrong to me.

In general, expect a few timing issues being reported as vfork problems.

Can you elaborate on this? What kinds of timing-related issues are you expecting to see reported?

joshudson · 2019-02-18T15:23:32Z

It changes GC speed so use-after-free p/invoke bugs can look like this knob does something.

janvorli · 2019-02-18T15:24:18Z

I think we should either be confident enough in the change to be able to explain to everyone why it's safe, or we should revert it.

Yeah, I think you are right.

…fx#33289) * Use vfork() to start child processes where this yields a performance improvement due to getting rid of page faults. * Remove specific dependency on shared memory vfork so that cross compiles work again. * Added signal mask code so that a child process can't confuse the parent; also took care of pthread cancellation mask in case third-party native code tries to use it * Fix issues from vfork() pull request review * Check handler before replacing it * Improve readability of signal handler removing * Convert tabs to spaces * use NSIG instead of dynamic probing because glibc punches a hole in the middle of the signal list * Exclude Mac OSX from vfork() because we don't quite trust it. * Fix one last batch of typos Commit migrated from dotnet/corefx@0a561e3

Use vfork() to start child processes where this yields a performance …

2998311

…improvement due to getting rid of page faults.

danmoseley requested a review from tmds November 7, 2018 03:53

tmds reviewed Nov 7, 2018

View reviewed changes

src/Native/Unix/System.Native/pal_process.c Outdated Show resolved Hide resolved

tmds reviewed Nov 7, 2018

View reviewed changes

src/Native/Unix/System.Native/pal_process.c Show resolved Hide resolved

tmds reviewed Nov 7, 2018

View reviewed changes

Remove specific dependency on shared memory vfork so that cross compi…

9dc1bbb

…les work again.

Added signal mask code so that a child process can't confuse the pare…

5e9b8e3

…nt; also took care of pthread cancellation mask in case third-party native code tries to use it

janvorli reviewed Dec 14, 2018

View reviewed changes

izbyshev reviewed Jan 27, 2019

View reviewed changes

use NSIG instead of dynamic probing because glibc punches a hole in t…

7186c01

…he middle of the signal list

Exclude Mac OSX from vfork() because we don't quite trust it.

0cb3196

stephentoub reviewed Feb 11, 2019

View reviewed changes

src/Native/Unix/System.Native/pal_process.c Outdated Show resolved Hide resolved

stephentoub reviewed Feb 11, 2019

View reviewed changes

src/Native/Unix/System.Native/pal_process.c Outdated Show resolved Hide resolved

stephentoub approved these changes Feb 11, 2019

View reviewed changes

Fix one last batch of typos

40860e0

stephentoub closed this Feb 14, 2019

stephentoub reopened this Feb 14, 2019

stephentoub merged commit 0a561e3 into dotnet:master Feb 15, 2019

Use vfork() improve performance when starting processes. #33289

Use vfork() improve performance when starting processes. #33289

Conversation

joshudson commented Nov 7, 2018

joshudson commented Nov 7, 2018 • edited Loading

joshudson commented Nov 7, 2018

Choose a reason for hiding this comment

tmds commented Nov 7, 2018

stephentoub commented Nov 7, 2018

tmds commented Nov 7, 2018

bartonjs commented Nov 7, 2018

joshudson commented Nov 7, 2018

joshudson commented Nov 7, 2018

stephentoub commented Nov 7, 2018

joshudson commented Nov 7, 2018

stephentoub commented Dec 13, 2018

jkotas commented Dec 13, 2018 • edited Loading

joshudson commented Dec 13, 2018 • edited Loading

stephentoub commented Dec 13, 2018

joshudson commented Dec 13, 2018

janvorli commented Dec 13, 2018

joshudson commented Dec 13, 2018

janvorli commented Dec 13, 2018

joshudson commented Dec 13, 2018

stephentoub commented Dec 13, 2018

joshudson commented Dec 13, 2018 • edited Loading

joshudson commented Dec 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janvorli commented Jan 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izbyshev commented Jan 27, 2019

joshudson commented Jan 28, 2019

janvorli commented Feb 1, 2019 • edited Loading

stephentoub commented Feb 6, 2019

joshudson commented Feb 8, 2019

janvorli commented Feb 8, 2019

joshudson commented Feb 10, 2019

joshudson commented Feb 12, 2019

stephentoub commented Feb 12, 2019

stephentoub commented Feb 13, 2019

stephentoub commented Feb 15, 2019

janvorli commented Feb 18, 2019

stephentoub commented Feb 18, 2019

janvorli commented Feb 18, 2019

joshudson commented Feb 18, 2019 • edited Loading

stephentoub commented Feb 18, 2019

joshudson commented Feb 18, 2019

janvorli commented Feb 18, 2019

joshudson commented Nov 7, 2018 •

edited

Loading

jkotas commented Dec 13, 2018 •

edited

Loading

joshudson commented Dec 13, 2018 •

edited

Loading

joshudson commented Dec 13, 2018 •

edited

Loading

joshudson commented Dec 14, 2018 •

edited

Loading

janvorli commented Feb 1, 2019 •

edited

Loading

joshudson commented Feb 18, 2019 •

edited

Loading