Skip to content

Commit

Permalink
Fixed hang/crash on MacOS if mclapply is called when data.table is me…
Browse files Browse the repository at this point in the history
…rely loaded, #2418
  • Loading branch information
mattdowle committed Oct 19, 2017
1 parent 39c137d commit 9d1f3e2
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 14 deletions.
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,11 @@
7. Clarified that "data.table always sorts in `C-locale`" means that upper-case letters are sorted before lower-case letters by ordering in data.table (e.g. `setorder`, `setkey`, `DT[order(...)]`). Thanks to @hughparsonage for the pull request editing the documentation. Note this makes no difference in most cases of data; e.g. ids where only uppercase or lowercase letters are used (`"AB123"<"AC234"` is always true, regardless), or country names and words which are consistently capitalized. For example, `"America" < "Brazil"` is not affected (it's always true), and neither is `"america" < "brazil"` (always true too); since the first letter is consistently capitalized. But, whether `"america" < "Brazil"` (the words are not consistently capitalized) is true or false in base R depends on the locale of your R session. In America it is true by default and false if you i) type `Sys.setlocale(locale="C")`, ii) the R session has been started in a C locale for you which can happen on servers/services (the locale comes from the environment the R session is started in). However, `"america" < "Brazil"` is always, consistently false in data.table which can be a surprise because it differs to base R by default in most regions. It is false because `"B"<"a"` is true because all upper-case letters come first, followed by all lower case letters (the ascii number of each letter determines the order, which is what is meant by `C-locale`).


### Changes in v1.10.4-3 (on CRAN 20 Oct 2017)

1. Fixed crash/hang on MacOS when `parallel::mclapply` is used and data.table is merely loaded, [#2418](https://github.com/Rdatatable/data.table/issues/2418). Oddly, all tests including test 1705 (which tests `mclapply` with data.table) passed fine on CRAN. It appears to be some versions of MacOS or some versions of libraries on MacOS, perhaps. Many thanks to Martin Morgan for reporting and confirming this fix works. Thanks also to @asenabouth, Joe Thorley and Danton Noriega for testing, debugging and confirming that automatic parallelism inside data.table (such as `fwrite`) works well even on these MacOS installations. See also news items below for 1.10.4-1 and 1.10.4-2.


### Changes in v1.10.4-2 (on CRAN 12 Oct 2017)

1. OpenMP on MacOS is now supported by CRAN and included in CRAN's package binaries for Mac. But installing v1.10.4-1 from source on MacOS failed when OpenMP was not enabled at compile time, [#2409](https://github.com/Rdatatable/data.table/issues/2409). Thanks to Liz Macfie and @fupangpangpang for reporting. The startup message when OpenMP is not enabled has been updated.
Expand Down
33 changes: 19 additions & 14 deletions src/openmp-utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -56,23 +56,28 @@ SEXP setDTthreads(SEXP threads) {
return ScalarInteger(old);
}

// auto avoid deadlock when data.table called from parallel::mclapply
void when_fork() {
/*
Auto avoid deadlock when data.table is used from within parallel::mclapply
GNU OpenMP seems ok with just setting DTthreads to 1 which limits the next parallel region
if data.table is used within the fork'd proceess. This is tested by test 1705.
// attempted workaround for Intel's OpenMP implementation which leaves threads running after
// parallel region; these crash when forked.
#ifdef _OPENMP
omp_set_num_threads(1);
#endif
We used to have an after_fork() callback too, to return to multi-threaded mode after parallel's
fork completes. But now in an attempt to alleviate problems propogating (likely Intel's OpenMP only)
we now leave data.table in single-threaded mode after parallel's fork. User can call setDTthreads(0)
to return to multi-threaded as we do in tests on Linux.
// GNU OpenMP seems ok with just setting DTthreads to 1 which limits the next parallel region
// if data.table is used within the fork'd proceess.
DTthreads = 1;
DO NOT call omp_set_num_threads(1) inside when_fork()!! That causes a different crash/hang on MacOS
upon mclapply's fork even if data.table is merely loaded and neither used yet in the session nor by
what mclapply is calling. Even when passing on CRAN's MacOS all-OK. As discovered by several MacOS
and RStudio users here when 1.10.4-2 went to CRAN :
https://github.com/Rdatatable/data.table/issues/2418
v1.10.4-3 removed the call to omp_set_num_threads().
We do not know why calling (mere) omp_set_num_threads(1) in the when_fork() would cause a problem
on MacOS.
*/

// We used to have an after_fork() callback too, to return to multi-threaded mode after parallel's
// fork completes. But now in an attempt to alleviate problems propogating (likely Intel's OpenMP only)
// we now leave data.table in single-threaded mode after parallel's fork. User can call setDTthreads(0)
// to return to multi-threaded as we do in tests on Linux.
void when_fork() {
DTthreads = 1;
}

void avoid_openmp_hang_within_fork() {
Expand Down

0 comments on commit 9d1f3e2

Please sign in to comment.