Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in ssyrk causing deadlock #1117

Closed
matthewfl opened this issue Mar 8, 2017 · 12 comments · Fixed by #1225
Closed

Race condition in ssyrk causing deadlock #1117

matthewfl opened this issue Mar 8, 2017 · 12 comments · Fixed by #1225

Comments

@matthewfl
Copy link

matthewfl commented Mar 8, 2017

Using ssyrk with multiple threads eventually causes openblas to deadlock and not return.

Sample stack traces:
https://gist.github.com/matthewfl/455d86676034fccce0dc2a41766602a2

This seems to be related to one of these lines, but I haven't narrowed it down further yet:
https://github.com/xianyi/OpenBLAS/blob/develop/driver/level3/level3_syrk_threaded.c#L361
https://github.com/xianyi/OpenBLAS/blob/develop/driver/level3/level3_syrk_threaded.c#L469
https://github.com/xianyi/OpenBLAS/blob/develop/driver/level3/level3_syrk_threaded.c#L267

This takes my program about ~5 hours to hit this race condition. I am guessing that it requires at least 10,000 calls of this method with ~300^2 matrix to cause it to deadlock.

This is also on the release v0.2.19

@martin-frbg
Copy link
Collaborator

Probably similar to #1071 - you could try if "make USE_SIMPLE_THREADED_LEVEL3=1" works around it here as well. (If it does, the problem goes back to GotoBLAS2-1.08 or even earlier)

@matthewfl
Copy link
Author

I tried the USE_SIMPLE_THREADED_LEVEL3=1 and it at least did not deadlock through one run of my program, so atm it seems that this is a work around around.

@brada4
Copy link
Contributor

brada4 commented Mar 9, 2017

can you attach gdb to deadlocked process and upload backtrace?
i.e:

$ script
$ gdb
> attach <pid>
> thread apply all backtrace
> detach
> ^d
$ ^d
file is typescript... 

@martin-frbg
Copy link
Collaborator

I noticed that wernsaar has a possible bugfix for the stride calculation in syrk_thread.c in his development fork, if you have time for experimenting you could try to cherrypick that.

@matthewfl
Copy link
Author

@brada4 I will get all the threads, however all of the threads had a similar stack track which I copied in the original post. https://gist.github.com/matthewfl/455d86676034fccce0dc2a41766602a2

@martin-frbg I will try @wernsaar 's branch out and see if that works

@matthewfl
Copy link
Author

@brada4 http://sprunge.us/ZMWK

I also have been running wernsaar's patch overnight and it still hasn't crashed

@martin-frbg
Copy link
Collaborator

Thanks for checking, sounds like good news (assuming that you did that build without the USE_SIMPLE_THREADED_LEVEL3=1 workaround still in place :-) )

@matthewfl
Copy link
Author

@martin-frbg so with the v0.2.19 release I tried that flag and it at least worked at least once, with wernsaar's code I did not use that flag and it hasn't crashed letting it run in a loop overnight.

@martin-frbg
Copy link
Collaborator

Thanks for the clarification. So really looking good (though maybe I should have a bad conscience for picking fruit in the neighbor's garden)

@brada4
Copy link
Contributor

brada4 commented Mar 10, 2017

Thank you for backtrace:
You use multiple python pthreads, probably not knowing that, USE_OPENMP=1 is another option to try (there is no sign 2nd python thread called BLAS but anyway option worth exploring)

@matthewfl
Copy link
Author

matthewfl commented Mar 11, 2017

@brada4 there is only one python thread here. The second python thread is python's GC background thread or ipython's auto complete, but these are never going to call blas.

fyi: the arguments that I compiled openblas with were: make FC=gfortran USE_OPENMP=0 USE_THREAD=1 NO_LAPACK=0 BUILD_LAPACK_DEPRECATED=1 DYNAMIC_ARCH=1 NUM_THREADS=64 libs netlib shared, and when starting the program, I set OMP_NUM_THREADS=16

@brada4
Copy link
Contributor

brada4 commented Mar 11, 2017

Filtering redundant options (see Makefile.rule for defaults)
make DYNAMIC_ARCH=1 NUM_THREADS=64
OpenMP version is certainly thread-safe, i.e if it works (with current code) it clearly points to a problem with pthreads, if it does not - with assembly kernels involved or something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants