-
-
Notifications
You must be signed in to change notification settings - Fork 15.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openblas: trouble with DYANAMIC_ARCH feature and bug in AVX512 code #59708
Comments
Would this break AMD systems? |
I am not sure if Nehalem and Haswell would still work on all AMD systems. A choice of a fixed target would require a careful investigation, which CPUs are compatible. It is unfortunate that the Athlon target does not to work anymore (that was a conservative but safe choice). |
I'm currently using Ryzen a lot, so if you write what to try, I can do that. EDIT: maybe the piledriver I have lying around wouldn't be too hard for me either. |
@vcunat Thanks for the help. I use a an override such as this locally to fix the problem (a full version of it can be found here https://github.com/markuskowa/NixOS-QChem/blob/master/default.nix):
It would be great if you could test the following on Ryzen:
Both derivations have check phases that do some basic numerical test. If there are incompatible instructions in the binaries it should show in the check phase. That would be a good first test. I will try to come up with some nix expression that could be used as a more careful test. Given that we had trouble with openblas in the past it would make sense to have some form of extended test suite that could be used for future updates also (and tested by several people on different CPU models). |
Both passed successfully, including some test suites. (I just used the expression above as overlay for current master.) |
@vcunat Thansk a lot. That sounds like a good start. I also went through the sources of openblas and found that there is a |
openblas is built with the |
AVX512 support was disabled due to an issue in the AVX512 DGEMM kernel: NixOS#59708 However, this kernel has been replaced upstream a while ago: https://github.com/xianyi/OpenBLAS/releases/tag/v0.3.8 And they have enabled AVX512 kernels since then.
Issue description
The AVX512 kernel on SKYLAKE-X CPUs seems to have a bug that produces wrong numbers (see OpenMathLib/OpenBLAS#2061). I ran into the bug when I tried to use CP2K (external derivation https://github.com/markuskowa/NixOS-QChem/tree/master/cp2k).
Seems also to affect
julia
(OpenMathLib/OpenBLAS#1955).Unfortunately there is no simple solution available:Disable the AVX512 DGEMM kernel (again) OpenMathLib/OpenBLAS#2061 turns off the problematic kernel but then even the unit tests in CP2K fail (which passed before) leaving openblas in a even more broken state.Alternatively the problem could be solved by turning off.DYANAMIC_ARCH
but then compilation fails withTARGET="ATHLON"
.TARGET="NEHALEM"
orTARGET="HASWELL"
seems to work but this might breakopenblas
on older system (> ~8-10 years old)Waiting for openblas 0.3.6 might be the best option (?)The problem can be solved by setting
NO_AVX512
in the make flags.Using the
DYANAMIC_ARCH
feature has another unwanted side effect on some SKYLAKE CPUs. Heavy utilization of AVX512 instructions can lead to a drastic reduction in CPU clock frequency. On models like the Xeon Scalable Silver which have only one AVX512 unit per core (e.g. 2.2 GHz -> 1.4 GHz), this can lead to significantly slower code. For example a 20 core linpack benchmark (HPL) on a dual CPU Xeon Silver 4114 system yields 400 GFlops with AVX512 turned on and 570 GFlops with AVX512 disabled (use AVX2 instead). Note that MKL has a similar problem (https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/776894).Steps to reproduce
dgemm
function on a Skylake series CPUs (?)Technical details
CC @costrouc @dtzWill @ttuegel @vcunat
The text was updated successfully, but these errors were encountered: