Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compilation with clang11 causes fsort to crash #4786

Closed
jangorecki opened this issue Oct 27, 2020 · 5 comments · Fixed by #4808
Closed

compilation with clang11 causes fsort to crash #4786

jangorecki opened this issue Oct 27, 2020 · 5 comments · Fixed by #4808
Milestone

Comments

@jangorecki
Copy link
Member

jangorecki commented Oct 27, 2020

Image used below compiles R-devel with gcc but uses clang11 for compiling packages.
Minimal example:

docker run -it registry.gitlab.com/jangorecki/dockerfiles/r-devel-clang Rscript -e 'install.packages("data.table"); readLines(system.file("cc", package="data.table")); library(data.table); example(fsort)'

gdb info (do not just copy paste at once)

docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined registry.gitlab.com/jangorecki/dockerfiles/r-devel-clang /bin/bash
apt-get update -qq && apt-get -y install gdb
vim ~/.R/Makevars
# change default optimization to -O0
Rscript -e 'install.packages("data.table"); readLines(system.file("cc", package="data.table"))'
R -d gdb
run
library(data.table)
x = runif(1e6)
ans2 = fsort(x)
 Thread 17 "R" received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7fffaaffaf00 (LWP 871)]
 0x00007ffff45b796a in dradix_r (in=0x7ffff37349d8, working=0x7fffa4000c10, 
     n=248, fromBit=33, toBit=40, counts=0x7fffa4000c40) at fsort.c:62
 62      fsort.c: No such file or directory.
@jangorecki jangorecki added this to the 1.13.3 milestone Oct 27, 2020
@mattdowle

This comment has been minimized.

@mattdowle
Copy link
Member

mattdowle commented Nov 12, 2020

I finally managed to reproduce this locally. With clang-11, ASAN, and thanks to Jan's investigation OpenMP is necessary too.
The segfault comes up straight away on test 1888.6 as below. On that test is strange, though, and not 1888.[1-5] as number 6 should just be a simple error on verbose being the wrong type. Maybe there's a small delay in printing the output because the ASAN error in T3 needs to roll up to the main thread or something. (I was mixing up 6's and 8's; 1888.8 was the simple error on verbose, not 1888.6. So there wasn't any delay or anything strange.) To happen on the small 1e4 size in this test means that it's unlikely to be the uncaught malloc (marked TODO in the source) I thought it might be, although that should be done anyway.

To get clang-11, since it isn't available yet, I added deb http://apt.llvm.org/focal/ llvm-toolchain-focal-11 main to /etc/apt/sources.list as per https://apt.llvm.org/ with the GPG key command there. Then sudo apt-get install clang-11.
Aside: installing clang-11 appears to have caused R's ./configure to think that wllvm is now my default compiler, and after compiling with it apparently proceeds ok, causes R CMD INSTALL to fail with :

CRITICAL: No compiler set. Please set environment variable LLVM_COMPILER
WARNING:wllvm: exception case:  No compiler set. Please set environment variable %s

Happily, ./configure with explicit CC= set (to clang-11 in this case), as per CRAN_Release.cmd, fixes that aside.

$ Rdevel
R Under development (unstable) (2020-11-10 r79412) -- "Unsuffered Consequences"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> library(data.table)
data.table 1.13.2 using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
> test.data.table()
getDTthreads(verbose=TRUE):
  omp_get_num_procs()            12
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         2147483647
  omp_get_max_threads()          12
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 6 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: /home/mdowle/build/R-devel/library/data.table/tests/tests.Rraw.bz2 

**** Suggested package bit64 is not installed. Tests using it will be skipped.


**** Suggested package xts is not installed. Tests using it will be skipped.


**** Suggested package nanotime is not installed. Tests using it will be skipped.


**** Suggested package R.utils is not installed. Tests using it will be skipped.


**** Suggested package yaml is not installed. Tests using it will be skipped.


**** Full long double accuracy is not available. Tests using this will be skipped.

Running test id 1199.2      Test 1199.2 didn't produce the correct error :
Expected: only defined on a data frame with all numeric variables 
Observed: only defined on a data frame with all numeric-alike variables 
Running test id 1888.6      =================================================================
==3262448==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x610000072300 at pc 0x7fc2de4e42ff bp 0x7fc2d9988510 sp 0x7fc2d9988508
WRITE of size 8 at 0x610000072300 thread T3
    #0 0x7fc2de4e42fe in dradix_r /tmp/RtmpcOKzHe/R.INSTALL31c6cd130bd3fb/data.table/src/fsort.c:62:32
    #1 0x7fc2de4e27f4 in .omp_outlined._debug__.20 /tmp/RtmpcOKzHe/R.INSTALL31c6cd130bd3fb/data.table/src/fsort.c:291:11
    #2 0x7fc2de4e4ec3 in .omp_outlined..21 /tmp/RtmpcOKzHe/R.INSTALL31c6cd130bd3fb/data.table/src/fsort.c:256:5
    #3 0x7fc2e5a11d62 in __kmp_invoke_microtask (/usr/lib/x86_64-linux-gnu/libomp.so.5+0xaed62)
    #4 0x7fc2e59a4182  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x41182)
    #5 0x7fc2e59a2da9  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x3fda9)
    #6 0x7fc2e59f7dc9  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x94dc9)
    #7 0x7fc2e5a61608 in start_thread /build/glibc-ZN95T4/glibc-2.31/nptl/pthread_create.c:477:8
    #8 0x7fc2e5878292 in clone (/usr/lib/x86_64-linux-gnu/libc.so.6+0x122292)

Address 0x610000072300 is a wild pointer.
SUMMARY: AddressSanitizer: heap-buffer-overflow /tmp/RtmpcOKzHe/R.INSTALL31c6cd130bd3fb/data.table/src/fsort.c:62:32 in dradix_r
Shadow bytes around the buggy address:
  0x0c2080006410: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2080006420: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2080006430: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2080006440: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2080006450: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2080006460:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2080006470: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2080006480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2080006490: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c20800064a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c20800064b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
Thread T3 created by T0 here:
    #0 0x495d2a in pthread_create (/home/mdowle/build/R-devel/bin/exec/R+0x495d2a)
    #1 0x7fc2e59f74a3  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x944a3)

==3262448==ABORTING

@mattdowle
Copy link
Member

So far I can't reproduce with 1-4 threads, but can with 5 and 6 threads. To test that, I'm letting the batching calculations proceed with 6 threads and just restricting the main parallel region on line 256 to 1-6 threads.
Output with 5 threads and verbose on too:

> library(data.table)
data.table 1.13.3 IN DEVELOPMENT built 2020-11-12 14:39:44 UTC; mdowle using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
> x = runif(1e6)
> ans2 = fsort(x, verbose=TRUE)
nth=6, nBatch=12
[New Thread 0x7fffeec2c780 (LWP 3296531)]
[New Thread 0x7fffee413800 (LWP 3296532)]
[New Thread 0x7fffedbfa880 (LWP 3296533)]
[New Thread 0x7fffed3e1900 (LWP 3296534)]
[New Thread 0x7fffecbc8980 (LWP 3296535)]
Range = [1.51806e-07,1]
maxBit=56; MSBNbits=16; shift=41; MSBsize=65536
counts is 6MB (128 pages per nBatch=12, batchSize=83334, lastBatchSize=83326)
Top 5 MSB counts: 310 292 291 291 289 
Reduced MSBsize from 65536 to 15989 by excluding 0 and 1 counts
=================================================================
==3296519==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60b00005b208 at pc 0x7fffeff6d2e0 bp 0x7fffedbf9580 sp 0x7fffedbf9578
WRITE of size 8 at 0x60b00005b208 thread T3
[Detaching after fork from child process 3296536]
    #0 0x7fffeff6d2df in dradix_r /tmp/Rtmpy3Fe15/R.INSTALL324bd159d32e3e/data.table/src/fsort.c:58:32
    #1 0x7fffeff6b946 in .omp_outlined._debug__.20 /tmp/Rtmpy3Fe15/R.INSTALL324bd159d32e3e/data.table/src/fsort.c:286:11
    #2 0x7fffeff6de13 in .omp_outlined..21 /tmp/Rtmpy3Fe15/R.INSTALL324bd159d32e3e/data.table/src/fsort.c:251:5
    #3 0x7ffff749fd62 in __kmp_invoke_microtask (/usr/lib/x86_64-linux-gnu/libomp.so.5+0xaed62)
    #4 0x7ffff7432182  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x41182)
    #5 0x7ffff7430da9  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x3fda9)
    #6 0x7ffff7485dc9  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x94dc9)
    #7 0x7ffff74ef608 in start_thread /build/glibc-ZN95T4/glibc-2.31/nptl/pthread_create.c:477:8
    #8 0x7ffff7306292 in clone (/usr/lib/x86_64-linux-gnu/libc.so.6+0x122292)

Address 0x60b00005b208 is a wild pointer.
SUMMARY: AddressSanitizer: heap-buffer-overflow /tmp/Rtmpy3Fe15/R.INSTALL324bd159d32e3e/data.table/src/fsort.c:58:32 in dradix_r
Shadow bytes around the buggy address:
  0x0c16800035f0: fa fa fa fa fa fa 00 00 00 00 00 00 00 00 00 00
  0x0c1680003600: 00 00 00 00 fa fa fa fa fa fa fa fa 00 00 00 00
  0x0c1680003610: 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa
  0x0c1680003620: fa fa 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
  0x0c1680003630: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c1680003640: fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1680003650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1680003660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1680003670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1680003680: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1680003690: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
Thread T3 created by T0 here:
    #0 0x495d2a in pthread_create (/home/mdowle/build/R-devel/bin/exec/R+0x495d2a)
    #1 0x7ffff74854a3  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x944a3)

==3296519==ABORTING
[Thread 0x7fffecbc8980 (LWP 3296535) exited]
[Thread 0x7fffed3e1900 (LWP 3296534) exited]
[Thread 0x7fffedbfa880 (LWP 3296533) exited]
[Thread 0x7fffee413800 (LWP 3296532) exited]
[Thread 0x7fffeec2c780 (LWP 3296531) exited]
[Inferior 1 (process 3296519) exited with code 01]

@mattdowle
Copy link
Member

mattdowle commented Nov 12, 2020

It seems that the assumption documented in this comment is not true in clang-11 and whatever version of OpenMP that is using.

// All we assume here is that a thread can never be assigned to an earlier iteration; i.e. threads 0:(nth-1)
// get iterations 0:(nth-1) possibly out of order, then first-come-first-served in order after that.
// If a thread deals with an msb lower than the first one it dealt with, then its *working will be too small.

Tracing it though locally shows that thread 4 receives iteration 2,552 after iteration 12,682. That causes working to be too small as the comment predicted.
It seems there is no ordering guarantee of the dynamic schedule in OpenMP. It was just in practice that the implementations did allocate iterations to threads in the natural ordered way. Until clang-11.

@mattdowle
Copy link
Member

mattdowle commented Nov 12, 2020

There's a modifier available on the dynamic schedule: monotonic. So the following fixes it.

#pragma omp for schedule(monotonic:dynamic,1)

There are several references online stating that the monotonic (and nonmontonic and simd) modifiers were added to OpenMP 4.5 (Nov 2015) but I can't see that explicitly in OpenMP news. Whether OpenMP before 4.5 ignores the modifier, or halts as invalid, don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants