-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
symmetric matrix inversion giving incorrect results #2194
Comments
What CPU are you using (SkylakeX by any chance) ? Unfortunately the relatively recent AVX512-enabled DGEMM kernel has turned out to be problematic, and partial changes in 0.3.6 seem to have actually made the problem worse. |
Does it make any difference if running with OPENBLAS_NUM_THREADS=1 |
@martin-frbg This is my But I also tried compiling openblas-3.5 from source and got the correct result. Would the DGEMM kernel even come up for potrf and potri?
|
I don't know what you mean by 'undefined behavior'. The linking I used is taken directly from ATLAS's docs. Also, potrf and potri are lapack functions, so they wouldn't be provided by f77blas alone.
Given that ATLAS and openblas-3.5 both give |
OPENBLAS_NUM_THREADS does make a difference
|
Your cpu is not avx512 Is the openblas return same each time? If you use -lblas as only BLAS? |
Yes, the results are consistent and given the differences with |
You may |
The call graphs of both DPOTRI and DPOTRF do include DGEMM (at least in the original reference implementation - netlib.org has nice diagrams for all LAPACK functions) but obviously this cannot be the evil AVX512 DGEMM problem if you see it on Coffee Lake hardware. Now that I have a little more time to look into this, I cannot reproduce the problem on my i7-8700K (basically the desktop version of your cpu, and valgrind/helgrind also do not report any memory or multihreading code issues). Which compiler version did you use to build OpenBLAS, and what build options - if any - did you use ? |
This is the compiler
For building, I just ran make without any options. |
Thanks. I tried with GCC 7.2.1, and now 9.1.0 as well, both 0.3.6 and current develop branch consistently return 2.21785e-6 for your testcase. |
Could you try a rebuild with |
Hmm. If it was #2154, it should still happen with GCC 9.1 (and probably already with 7.2.1) I think ? Also the potrf and potri in OpenBLAS are rewritten in C so unlikely to hit mixed-language ABI issues ... |
I still have the issue with |
Ok, so at least we can rule that nasty issue out. |
Even if this was a compiler problem, I do not see why it would affect only the OpenBLAS build. Any chance of a difference (however small, like "harmless" code cleanup) between the testcase you posted here and what you actually use ?
I do not think the FORTIFY_SOURCE and stack-protector-strong should make a difference (and these were not used in my tests), but perhaps it is worth a try ? UPDATE: they had no effect in my tests. |
No I ran it directly Also, I built 0.3.5 from source (just running make) and got the correct answer, so it's definitely a change in 0.3.6 that's causing the issue. |
Also, not sure that it makes a difference, but the exact environment setup is I'm running the example through the docker image |
Seems that would need to be a change that triggers a bug in the gcc or llibpthread provided by Ubuntu, as I cannot reproduce the problem with opensuse 15 on similar hardware. (Or might as well blame it on your OSX/docker setup - seems there are two flavors, "docker toolbox" runs Linux inside a VirtualBox VM to run docker, while Docker Desktop apparently is a native OSX application ?) |
I wouldn't draw that conclusion yet. Given that 0.3.5 works with the same setup, I think the most likely explanation is that there's a bug introduced in 0.3.6.
Because it gives the same wrong result each time with |
If it was a bug in 0.3.6 I would expect to see it on my system as well - basically the desktop version of your hardware, so same BLAS microkernels, same number of threads... |
Does anything change if you try develop version? Does not repeat for me either (all sorts of earlier cpus) I will not have time until mid-sep, i think Martin is up to same. |
I have now installed docker on my Haswell system and repeated the tests with the Ubuntu 19.04 image from hub.docker.io (actually both the stable 18.something and the "rolling" 19.04 as I made a mistake in the Dockerfile initially) - and both current develop and 0.3.6 always return 2.21785e-6 as they should. |
I was using Docker Desktop community version 2.0.0.3 |
Does NO_AVX2=1 build flag address numeric issues, in line with findings of #2244 ? |
Added a warning to the FAQ section in the wiki as there has been no activity on the xhyve issue tracker for the past 3 years |
For the latest version of OpenBlas, calling potrf then potri is giving incorrect results for the particular example below.
When I run it with 3.6, I get
If I run with the version installed by my package manager, I get the correct result
Which is also given by atlas
The text was updated successfully, but these errors were encountered: