Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dgemm segfault : library not thread safe . #602

Closed
vondele opened this issue Jul 7, 2015 · 14 comments
Closed

dgemm segfault : library not thread safe . #602

vondele opened this issue Jul 7, 2015 · 14 comments

Comments

@vondele
Copy link

vondele commented Jul 7, 2015

the following testcase will segfault from time to time (~1 out 1000, presumably depending on the load of the machine) if compiled with openmp and if linked against the recent release of openblas.

cat test_dgemm.f90
SUBROUTINE tester(i)
REAL_8, DIMENSION(:), ALLOCATABLE :: A,B,C
REAL_8 :: rnd(3)
INTEGER :: i
INTEGER :: M,N,K
! test random sizes
CALL RANDOM_NUMBER(rnd)
M=rnd(1)_37+1 ; N=rnd(2)_37+1 ; K=rnd(3)_37+1
ALLOCATE(C(M_N),A(M_K),B(K_N))
A=0 ; B=0 ; C=0
CALL DGEMM("N","N",M,N,K,1.0D0,A,M,B,K,0.0D0,C,M)
CALL DGEMM("T","N",M,N,K,1.0D0,A,K,B,K,0.0D0,C,M)
CALL DGEMM("N","T",M,N,K,1.0D0,A,M,B,N,0.0D0,C,M)
CALL DGEMM("T","T",M,N,K,1.0D0,A,K,B,N,0.0D0,C,M)
END SUBROUTINE tester

PROGRAM TEST_THREAD_SAFE
!$OMP PARALLEL DO
DO i=1,30
CALL tester(i)
ENDDO
END PROGRAM

gfortran -fopenmp -g -O2 -march=native test_dgemm.f90 -lopenblas_serial
export OMP_NUM_THREADS=4 ; for i in seq 1 1000 ; do ./a.out ; done
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7FD572E91B87
#1 0x7FD572E90D80
#2 0x3B6323269F
#3 0x7FD57334EF64
#4 0x7FD573271320
#5 0x7FD5732126EA
#6 0x400C68 in tester_ at test_dgemm.f90:11
#7 0x400D98 in MAIN__._omp_fn.0 at test_dgemm.f90:20 (discriminator 1)
#8 0x7FD572C4703D
#9 0x3B63A079D0
#10 0x3B632E88FC
#11 0xFFFFFFFFFFFFFFFF

Segmentation fault

openblas has been compiled with:
make -j $nprocs USE_THREAD=0 LIBNAMESUFFIX=serial PREFIX=${INSTALLDIR}

upon building with -fsanitize=thread (requires a modified gcc), the following errors are produced:

WARNING: ThreadSanitizer: data race (pid=40666)
Read of size 4 at 0x7fd54d8d7130 by thread T3:
#0 blas_memory_alloc /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1012 (libopenblas_serial.so.0+0x0000001de505)
#1 dgemm_ /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394 (libopenblas_serial.so.0+0x0000000b92dd)
#2 tester_ /data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#3 MAIN__._omp_fn.0 /data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#4 gomp_thread_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118 (libgomp.so.1+0x0000000177a8)

Previous write of size 4 at 0x7fd54d8d7130 by main thread:
#0 blas_memory_alloc /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1033 (libopenblas_serial.so.0+0x0000001de5a1)
#1 dgemm_ /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394 (libopenblas_serial.so.0+0x0000000b92dd)
#2 tester_ /data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#3 MAIN__._omp_fn.0 /data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#4 GOMP_parallel /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:168 (libgomp.so.1+0x000000010c6c)
#5 main /data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)

Location is global 'memory' of size 4096 at 0x7fd54d8d7120 (libopenblas_serial.so.0+0x000000f23130)

Thread T3 (tid=40670, running) created by main thread at:
#0 pthread_create /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000026c94)
#1 gomp_team_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796 (libgomp.so.1+0x000000017fee)
#2 GOMP_parallel /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167 (libgomp.so.1+0x000000010c67)
#3 main /data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)

SUMMARY: ThreadSanitizer: data race /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1012 blas_memory_alloc

WARNING: ThreadSanitizer: data race (pid=40666)
Read of size 8 at 0x7fd54d8d7120 by thread T1:
#0 blas_lock ../../common_x86_64.h:67 (libopenblas_serial.so.0+0x0000001de53c)
#1 blas_memory_alloc /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1014 (libopenblas_serial.so.0+0x0000001de53c)
#2 dgemm_ /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394 (libopenblas_serial.so.0+0x0000000b92dd)
#3 tester_ /data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#4 MAIN__._omp_fn.0 /data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#5 gomp_thread_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118 (libgomp.so.1+0x0000000177a8)

Previous write of size 8 at 0x7fd54d8d7120 by main thread:
[failed to restore the stack]

Location is global 'memory' of size 4096 at 0x7fd54d8d7120 (libopenblas_serial.so.0+0x000000f23120)

Thread T1 (tid=40668, running) created by main thread at:
#0 pthread_create /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000026c94)
#1 gomp_team_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796 (libgomp.so.1+0x000000017fee)
#2 GOMP_parallel /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167 (libgomp.so.1+0x000000010c67)
#3 main /data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)

SUMMARY: ThreadSanitizer: data race ../../common_x86_64.h:67 blas_lock

WARNING: ThreadSanitizer: data race (pid=40666)
Write of size 8 at 0x7fd545bfd020 by thread T1:
#0 dgemm_itcopy ../kernel/x86_64/../generic/gemm_tcopy_2.c:66 (libopenblas_serial.so.0+0x000000258c89)
#1 dgemm_nn /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/level3/level3.c:322 (libopenblas_serial.so.0+0x0000001368f5)
#2 dgemm_ /data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:426 (libopenblas_serial.so.0+0x0000000b9312)
#3 tester_ /data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#4 MAIN__._omp_fn.0 /data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#5 gomp_thread_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118 (libgomp.so.1+0x0000000177a8)

Previous write of size 8 at 0x7fd545bfd020 by main thread:
[failed to restore the stack]

Thread T1 (tid=40668, running) created by main thread at:
#0 pthread_create /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000026c94)
#1 gomp_team_start /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796 (libgomp.so.1+0x000000017fee)
#2 GOMP_parallel /data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167 (libgomp.so.1+0x000000010c67)
#3 main /data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)

SUMMARY: ThreadSanitizer: data race ../kernel/x86_64/../generic/gemm_tcopy_2.c:66 dgemm_itcopy

@xianyi
Copy link
Collaborator

xianyi commented Jul 7, 2015

Could you try link your application with OpenBLAS OpenMP version (make
USE_OPENMP=1)?

2015-07-07 10:04 GMT-05:00 Joost VandeVondele notifications@github.com:

the following testcase will segfault from time to time (~1 out 1000,
presumably depending on the load of the machine) if compiled with openmp
and if linked against the recent release of openblas.

cat test_dgemm.f90
SUBROUTINE tester(i)
REAL
_8, DIMENSION(:), ALLOCATABLE :: A,B,C REAL_8 :: rnd(3)
INTEGER :: i
INTEGER :: M,N,K
! test random sizes
CALL RANDOM_NUMBER(rnd)
M=rnd(1)_37+1 ; N=rnd(2)_37+1 ; K=rnd(3)
_37+1 ALLOCATE(C(M_N),A(M_K),B(K_N))
A=0 ; B=0 ; C=0
CALL DGEMM("N","N",M,N,K,1.0D0,A,M,B,K,0.0D0,C,M)
CALL DGEMM("T","N",M,N,K,1.0D0,A,K,B,K,0.0D0,C,M)
CALL DGEMM("N","T",M,N,K,1.0D0,A,M,B,N,0.0D0,C,M)
CALL DGEMM("T","T",M,N,K,1.0D0,A,K,B,N,0.0D0,C,M)
END SUBROUTINE tester

PROGRAM TEST_THREAD_SAFE
!$OMP PARALLEL DO
DO i=1,30
CALL tester(i)
ENDDO
END PROGRAM

gfortran -fopenmp -g -O2 -march=native test_dgemm.f90 -lopenblas_serial
export OMP_NUM_THREADS=4 ; for i in seq 1 1000 ; do ./a.out ; done
Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.

Backtrace for this error:
#0 0x7FD572E91B87
#1 #1 0x7FD572E90D80
#2 #2 0x3B6323269F
#3 #3 0x7FD57334EF64
#4 #4 0x7FD573271320
#5 #5 0x7FD5732126EA
#6 #6 0x400C68 in tester_ at
test_dgemm.f90:11
#7 #7 0x400D98 in
MAIN__._omp_fn.0 at test_dgemm.f90:20 (discriminator 1)
#8 #8 0x7FD572C4703D
#9 #9 0x3B63A079D0
#10 #10 0x3B632E88FC
#11 #11 0xFFFFFFFFFFFFFFFF
Segmentation fault

openblas has been compiled with:
make -j $nprocs USE_THREAD=0 LIBNAMESUFFIX=serial PREFIX=${INSTALLDIR}
upon building with -fsanitize=thread (requires a modified gcc), the
following errors are produced:

WARNING: ThreadSanitizer: data race (pid=40666)
Read of size 4 at 0x7fd54d8d7130 by thread T3:
#0 blas_memory_alloc
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1012
(libopenblas_serial.so.0+0x0000001de505)
#1 #1 dgemm_
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394
(libopenblas_serial.so.0+0x0000000b92dd)
#2 #2 tester_
/data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#3 #3 MAIN__._omp_fn.0
/data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#4 #4 gomp_thread_start
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118
(libgomp.so.1+0x0000000177a8)

Previous write of size 4 at 0x7fd54d8d7130 by main thread:
#0 blas_memory_alloc
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1033
(libopenblas_serial.so.0+0x0000001de5a1)
#1 #1 dgemm_
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394
(libopenblas_serial.so.0+0x0000000b92dd)
#2 #2 tester_
/data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#3 #3 MAIN__._omp_fn.0
/data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#4 #4 GOMP_parallel
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:168
(libgomp.so.1+0x000000010c6c)
#5 #5 main
/data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)

Location is global 'memory' of size 4096 at 0x7fd54d8d7120
(libopenblas_serial.so.0+0x000000f23130)

Thread T3 (tid=40670, running) created by main thread at:
#0 pthread_create
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895
(libtsan.so.0+0x000000026c94)
#1 #1 gomp_team_start
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796
(libgomp.so.1+0x000000017fee)
#2 #2 GOMP_parallel
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167
(libgomp.so.1+0x000000010c67)
#3 #3 main
/data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)
SUMMARY: ThreadSanitizer: data race
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1012
blas_memory_alloc

WARNING: ThreadSanitizer: data race (pid=40666)
Read of size 8 at 0x7fd54d8d7120 by thread T1:
#0 blas_lock ../../common_x86_64.h:67
(libopenblas_serial.so.0+0x0000001de53c)
#1 #1 blas_memory_alloc
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/others/memory.c:1014
(libopenblas_serial.so.0+0x0000001de53c)
#2 #2 dgemm_
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:394
(libopenblas_serial.so.0+0x0000000b92dd)
#3 #3 tester_
/data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#4 #4 MAIN__._omp_fn.0
/data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#5 #5 gomp_thread_start
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118
(libgomp.so.1+0x0000000177a8)

Previous write of size 8 at 0x7fd54d8d7120 by main thread:
[failed to restore the stack]

Location is global 'memory' of size 4096 at 0x7fd54d8d7120
(libopenblas_serial.so.0+0x000000f23120)

Thread T1 (tid=40668, running) created by main thread at:
#0 pthread_create
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895
(libtsan.so.0+0x000000026c94)
#1 #1 gomp_team_start
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796
(libgomp.so.1+0x000000017fee)
#2 #2 GOMP_parallel
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167
(libgomp.so.1+0x000000010c67)
#3 #3 main
/data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)
SUMMARY: ThreadSanitizer: data race ../../common_x86_64.h:67 blas_lock

WARNING: ThreadSanitizer: data race (pid=40666)
Write of size 8 at 0x7fd545bfd020 by thread T1:
#0 dgemm_itcopy ../kernel/x86_64/../generic/gemm_tcopy_2.c:66
(libopenblas_serial.so.0+0x000000258c89)
#1 #1 dgemm_nn
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/driver/level3/level3.c:322
(libopenblas_serial.so.0+0x0000001368f5)
#2 #2 dgemm_
/data/vjoost/toolchain-tsan/build/xianyi-OpenBLAS-d0c51c4/interface/gemm.c:426
(libopenblas_serial.so.0+0x0000000b9312)
#3 #3 tester_
/data/vjoost/gnu/bugs/test_dgemm.f90:11 (a.out+0x000000400f66)
#4 #4 MAIN__._omp_fn.0
/data/vjoost/gnu/bugs/test_dgemm.f90:20 (a.out+0x0000004010b0)
#5 #5 gomp_thread_start
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:118
(libgomp.so.1+0x0000000177a8)

Previous write of size 8 at 0x7fd545bfd020 by main thread:
[failed to restore the stack]

Thread T1 (tid=40668, running) created by main thread at:
#0 pthread_create
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libsanitizer/tsan/tsan_interceptors.cc:895
(libtsan.so.0+0x000000026c94)
#1 #1 gomp_team_start
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/team.c:796
(libgomp.so.1+0x000000017fee)
#2 #2 GOMP_parallel
/data/vjoost/toolchain-tsan/build/gcc-5.1.0/libgomp/parallel.c:167
(libgomp.so.1+0x000000010c67)
#3 #3 main
/data/vjoost/gnu/bugs/test_dgemm.f90:22 (a.out+0x000000400b7d)
SUMMARY: ThreadSanitizer: data race
../kernel/x86_64/../generic/gemm_tcopy_2.c:66 dgemm_itcopy


Reply to this email directly or view it on GitHub
#602.

@vondele
Copy link
Author

vondele commented Jul 7, 2015

The testcase seems to work in that case (using the dev. branch), but our application failed in a different spot. However, we actually would like to link a serial blas to our OMP application.

Trying to build the OMPed openblas with -fsanitize=thread I ran in an error, so there is another data race. I'll look into that as well.

@vondele
Copy link
Author

vondele commented Jul 8, 2015

Actually, I have to correct myself, also the USE_OPENMP=1 version crashes, but not if I only use 2 threads. It does crash if I use 4 threads.

@xianyi
Copy link
Collaborator

xianyi commented Jul 8, 2015

@vondele , What's your CPU? How many cores?

Please try make USE_OPENMP=1 NUM_THREADS=32

@vondele
Copy link
Author

vondele commented Jul 8, 2015

Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz
(32 cores, i.e. quad socket)

Makefile.conf contains:
CORE=NEHALEM
LIBCORE=nehalem
NUM_CORES=32

I compiled using make -j32 USE_THREAD=1 USE_OPENMP=1 (but this was the release r0.2.14)

@hjbreg
Copy link

hjbreg commented Aug 3, 2015

@vondele I tested it on a Linux machine with Intel(R) Xeon(R) CPU E5-2670, no error happened.

make USE_THREAD=0
gcc -fopenmp test_dgemm_omp.c libopenblas.a
./a.out

test_dgemm_omp.c

#include <stdlib.h>
#include <stdio.h>


#ifdef _OPENMP
    int omp_get_num_procs();
    int omp_get_max_threads();
    void omp_set_num_threads(int);
#endif


#define bint int

void dgemm_(char *transa, char *transb, bint *m, bint *n, bint *k, double *alpha,
    const double *a, bint *lda, const double *b, bint *ldb,
    double *beta, double *c, bint *ldc);

void test(int id)
{
    bint i = 0, m = 1000, n = 800, k = 600, N = m*k + k*n + m*n;
    double alpha = 1.0, beta = 0.0;
    double *A, *B, *C;

    printf("%d\n", id);

    A = (double*)malloc(N*sizeof(double));
    B = A + m*k;
    C = B + k*n;

    for (i = 0; i < N; ++i)
        A[i] = (double) rand() / RAND_MAX;

    dgemm_("N", "N", &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m);
    dgemm_("T", "N", &m, &n, &k, &alpha, A, &k, B, &k, &beta, C, &m);
    dgemm_("N", "T", &m, &n, &k, &alpha, A, &m, B, &n, &beta, C, &m);
    dgemm_("T", "T", &m, &n, &k, &alpha, A, &k, B, &n, &beta, C, &m);

    free(A);
}

int main()
{
    int i = 0;

#ifdef _OPENMP
    int num_procs = omp_get_num_procs();
    if (num_procs < omp_get_max_threads() * 2)
        omp_set_num_threads(num_procs < 4 ? 1 : num_procs/2);
#endif

    #pragma omp parallel for
    for (i = 0; i < 100; ++i)
        test(i);

    return 0;
}

@vondele
Copy link
Author

vondele commented Aug 9, 2015

@hjbreg can you put a loop around executing your testcase ? I think the segfault is at startup.

for i in seq 1 1000` ; do ./a.out ; done

also, I did the testing for very small matrices (~< 37 see original testcase).

@hjbreg
Copy link

hjbreg commented Aug 10, 2015

@vondele it is ok to put it in a loop, also with small matrix (m=30, n=20, k=10)

maybe you can try again using my test code above

@vondele
Copy link
Author

vondele commented Aug 17, 2015

sorry for the late reply, also your testcase leads to segfaults (using m=n=k=27), but only 4 out 1000 runs fail for me.

xianyi-OpenBLAS-d0c51c4> for i in seq 1 1000 ; do ./a.out >& out.$i ; done
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
xianyi-OpenBLAS-d0c51c4>

Maybe the version of gcc plays a role ? (here, gcc version 5.1.0 (GCC))

@hjbreg
Copy link

hjbreg commented Aug 18, 2015

I have no idea. I tested it on CentOS with gcc version 4.4.6.

@vondele
Copy link
Author

vondele commented Aug 18, 2015

hm, crashes here with gcc 4.6 and 5.1 on a redhat system.

I see no segfault for OMP_NUM_THREADS = 1, 2 or 3, but I do see them for 4 and higher (up to 32 cores).

@martin-frbg
Copy link
Collaborator

Could glibc version play a role ? I noticed that the original test case failed for me as well, but the segfaults went away when I ran the program from valgrind (which overrides malloc with own routines to be able to track memory accesses)

@vondele
Copy link
Author

vondele commented Aug 18, 2015

I think glibc or pthread library could play a role (unlikely in my opinion, as the tsan trace links points to the openblas library), but I'm not surpised it doesn't crash under valgrind as it likely is a race condition, which might not trigger under valgrind (i.e. vastly different timings, and valgrind might even serialize).

@martin-frbg
Copy link
Collaborator

Good point, so I should be running helgrind if anything (which would probably not add much over your tsan traces). Some obscure races appear to have been fixed in glibc-2.19 (and backported as 2.18.1 etc. down to 2.15.1) see https://sourceware.org/bugzilla/show_bug.cgi?id=15073 but probably not relevant here

@vondele vondele closed this as completed Apr 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants