-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dgemm segfault : library not thread safe . #602
Comments
Could you try link your application with OpenBLAS OpenMP version (make 2015-07-07 10:04 GMT-05:00 Joost VandeVondele notifications@github.com:
|
The testcase seems to work in that case (using the dev. branch), but our application failed in a different spot. However, we actually would like to link a serial blas to our OMP application. Trying to build the OMPed openblas with -fsanitize=thread I ran in an error, so there is another data race. I'll look into that as well. |
Actually, I have to correct myself, also the USE_OPENMP=1 version crashes, but not if I only use 2 threads. It does crash if I use 4 threads. |
@vondele , What's your CPU? How many cores? Please try |
Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz Makefile.conf contains: I compiled using make -j32 USE_THREAD=1 USE_OPENMP=1 (but this was the release r0.2.14) |
@vondele I tested it on a Linux machine with Intel(R) Xeon(R) CPU E5-2670, no error happened.
test_dgemm_omp.c #include <stdlib.h>
#include <stdio.h>
#ifdef _OPENMP
int omp_get_num_procs();
int omp_get_max_threads();
void omp_set_num_threads(int);
#endif
#define bint int
void dgemm_(char *transa, char *transb, bint *m, bint *n, bint *k, double *alpha,
const double *a, bint *lda, const double *b, bint *ldb,
double *beta, double *c, bint *ldc);
void test(int id)
{
bint i = 0, m = 1000, n = 800, k = 600, N = m*k + k*n + m*n;
double alpha = 1.0, beta = 0.0;
double *A, *B, *C;
printf("%d\n", id);
A = (double*)malloc(N*sizeof(double));
B = A + m*k;
C = B + k*n;
for (i = 0; i < N; ++i)
A[i] = (double) rand() / RAND_MAX;
dgemm_("N", "N", &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m);
dgemm_("T", "N", &m, &n, &k, &alpha, A, &k, B, &k, &beta, C, &m);
dgemm_("N", "T", &m, &n, &k, &alpha, A, &m, B, &n, &beta, C, &m);
dgemm_("T", "T", &m, &n, &k, &alpha, A, &k, B, &n, &beta, C, &m);
free(A);
}
int main()
{
int i = 0;
#ifdef _OPENMP
int num_procs = omp_get_num_procs();
if (num_procs < omp_get_max_threads() * 2)
omp_set_num_threads(num_procs < 4 ? 1 : num_procs/2);
#endif
#pragma omp parallel for
for (i = 0; i < 100; ++i)
test(i);
return 0;
} |
@hjbreg can you put a loop around executing your testcase ? I think the segfault is at startup. for i in seq 1 1000` ; do ./a.out ; done also, I did the testing for very small matrices (~< 37 see original testcase). |
@vondele it is ok to put it in a loop, also with small matrix (m=30, n=20, k=10) maybe you can try again using my test code above |
sorry for the late reply, also your testcase leads to segfaults (using m=n=k=27), but only 4 out 1000 runs fail for me. xianyi-OpenBLAS-d0c51c4> for i in Maybe the version of gcc plays a role ? (here, gcc version 5.1.0 (GCC)) |
I have no idea. I tested it on CentOS with gcc version 4.4.6. |
hm, crashes here with gcc 4.6 and 5.1 on a redhat system. I see no segfault for OMP_NUM_THREADS = 1, 2 or 3, but I do see them for 4 and higher (up to 32 cores). |
Could glibc version play a role ? I noticed that the original test case failed for me as well, but the segfaults went away when I ran the program from valgrind (which overrides malloc with own routines to be able to track memory accesses) |
I think glibc or pthread library could play a role (unlikely in my opinion, as the tsan trace links points to the openblas library), but I'm not surpised it doesn't crash under valgrind as it likely is a race condition, which might not trigger under valgrind (i.e. vastly different timings, and valgrind might even serialize). |
Good point, so I should be running helgrind if anything (which would probably not add much over your tsan traces). Some obscure races appear to have been fixed in glibc-2.19 (and backported as 2.18.1 etc. down to 2.15.1) see https://sourceware.org/bugzilla/show_bug.cgi?id=15073 but probably not relevant here |
the following testcase will segfault from time to time (~1 out 1000, presumably depending on the load of the machine) if compiled with openmp and if linked against the recent release of openblas.
PROGRAM TEST_THREAD_SAFE
!$OMP PARALLEL DO
DO i=1,30
CALL tester(i)
ENDDO
END PROGRAM
Backtrace for this error:
#0 0x7FD572E91B87
#1 0x7FD572E90D80
#2 0x3B6323269F
#3 0x7FD57334EF64
#4 0x7FD573271320
#5 0x7FD5732126EA
#6 0x400C68 in tester_ at test_dgemm.f90:11
#7 0x400D98 in MAIN__._omp_fn.0 at test_dgemm.f90:20 (discriminator 1)
#8 0x7FD572C4703D
#9 0x3B63A079D0
#10 0x3B632E88FC
#11 0xFFFFFFFFFFFFFFFF
Segmentation fault
openblas has been compiled with:$nprocs USE_THREAD=0 LIBNAMESUFFIX=serial PREFIX=$ {INSTALLDIR}
make -j
upon building with -fsanitize=thread (requires a modified gcc), the following errors are produced:
WARNING: ThreadSanitizer: data race (pid=40666)
Read of size 4 at 0x7fd54d8d7130 by thread T3:
#0 blas_memory_alloc /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1012 (libopenblas_serial.so.0+0x0000001de505)
#1 dgemm_ /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394 (libopenblas_serial.so.0+0x0000000b92dd)
#2 tester_ /data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#3 MAIN__._omp_fn.0 /data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#4 gomp_thread_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118 (libgomp.so.1+0x0000000177a8)
Previous write of size 4 at 0x7fd54d8d7130 by main thread:
#0 blas_memory_alloc /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1033 (libopenblas_serial.so.0+0x0000001de5a1)
#1 dgemm_ /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394 (libopenblas_serial.so.0+0x0000000b92dd)
#2 tester_ /data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#3 MAIN__._omp_fn.0 /data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#4 GOMP_parallel /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:168 (libgomp.so.1+0x000000010c6c)
#5 main /data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)
Location is global 'memory' of size 4096 at 0x7fd54d8d7120 (libopenblas_serial.so.0+0x000000f23130)
Thread T3 (tid=40670, running) created by main thread at:
#0 pthread_create /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000026c94)
#1 gomp_team_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796 (libgomp.so.1+0x000000017fee)
#2 GOMP_parallel /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167 (libgomp.so.1+0x000000010c67)
#3 main /data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)
SUMMARY: ThreadSanitizer: data race /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1012 blas_memory_alloc
WARNING: ThreadSanitizer: data race (pid=40666)
Read of size 8 at 0x7fd54d8d7120 by thread T1:
#0 blas_lock ../../common_x86_64.h:67 (libopenblas_serial.so.0+0x0000001de53c)
#1 blas_memory_alloc /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1014 (libopenblas_serial.so.0+0x0000001de53c)
#2 dgemm_ /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394 (libopenblas_serial.so.0+0x0000000b92dd)
#3 tester_ /data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#4 MAIN__._omp_fn.0 /data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#5 gomp_thread_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118 (libgomp.so.1+0x0000000177a8)
Previous write of size 8 at 0x7fd54d8d7120 by main thread:
[failed to restore the stack]
Location is global 'memory' of size 4096 at 0x7fd54d8d7120 (libopenblas_serial.so.0+0x000000f23120)
Thread T1 (tid=40668, running) created by main thread at:
#0 pthread_create /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000026c94)
#1 gomp_team_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796 (libgomp.so.1+0x000000017fee)
#2 GOMP_parallel /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167 (libgomp.so.1+0x000000010c67)
#3 main /data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)
SUMMARY: ThreadSanitizer: data race ../../common_x86_64.h:67 blas_lock
WARNING: ThreadSanitizer: data race (pid=40666)
Write of size 8 at 0x7fd545bfd020 by thread T1:
#0 dgemm_itcopy ../kernel/x86_64/../generic/gemm_tcopy_2.c:66 (libopenblas_serial.so.0+0x000000258c89)
#1 dgemm_nn /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/level3/level3.c:322 (libopenblas_serial.so.0+0x0000001368f5)
#2 dgemm_ /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:426 (libopenblas_serial.so.0+0x0000000b9312)
#3 tester_ /data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#4 MAIN__._omp_fn.0 /data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#5 gomp_thread_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118 (libgomp.so.1+0x0000000177a8)
Previous write of size 8 at 0x7fd545bfd020 by main thread:
[failed to restore the stack]
Thread T1 (tid=40668, running) created by main thread at:
#0 pthread_create /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000026c94)
#1 gomp_team_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796 (libgomp.so.1+0x000000017fee)
#2 GOMP_parallel /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167 (libgomp.so.1+0x000000010c67)
#3 main /data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)
SUMMARY: ThreadSanitizer: data race ../kernel/x86_64/../generic/gemm_tcopy_2.c:66 dgemm_itcopy
The text was updated successfully, but these errors were encountered: