Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openblas: trouble with DYANAMIC_ARCH feature and bug in AVX512 code #59708

Closed
markuskowa opened this issue Apr 16, 2019 · 7 comments
Closed

openblas: trouble with DYANAMIC_ARCH feature and bug in AVX512 code #59708

markuskowa opened this issue Apr 16, 2019 · 7 comments
Labels
0.kind: bug Something is broken

Comments

@markuskowa
Copy link
Member

markuskowa commented Apr 16, 2019

Issue description

The AVX512 kernel on SKYLAKE-X CPUs seems to have a bug that produces wrong numbers (see OpenMathLib/OpenBLAS#2061). I ran into the bug when I tried to use CP2K (external derivation https://github.com/markuskowa/NixOS-QChem/tree/master/cp2k).

Seems also to affect julia (OpenMathLib/OpenBLAS#1955).

Unfortunately there is no simple solution available:

  • Disable the AVX512 DGEMM kernel (again) OpenMathLib/OpenBLAS#2061 turns off the problematic kernel but then even the unit tests in CP2K fail (which passed before) leaving openblas in a even more broken state.
  • Alternatively the problem could be solved by turning off DYANAMIC_ARCH but then compilation fails with TARGET="ATHLON". TARGET="NEHALEM" or TARGET="HASWELL" seems to work but this might break openblas on older system (> ~8-10 years old).
  • Waiting for openblas 0.3.6 might be the best option (?)

The problem can be solved by setting NO_AVX512 in the make flags.

Using the DYANAMIC_ARCH feature has another unwanted side effect on some SKYLAKE CPUs. Heavy utilization of AVX512 instructions can lead to a drastic reduction in CPU clock frequency. On models like the Xeon Scalable Silver which have only one AVX512 unit per core (e.g. 2.2 GHz -> 1.4 GHz), this can lead to significantly slower code. For example a 20 core linpack benchmark (HPL) on a dual CPU Xeon Silver 4114 system yields 400 GFlops with AVX512 turned on and 570 GFlops with AVX512 disabled (use AVX2 instead). Note that MKL has a similar problem (https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/776894).

Steps to reproduce

  • Use dgemm function on a Skylake series CPUs (?)

Technical details

  • openblas 0.3.5
  • affects NixOS 19.03 and nixpkgs-unstable

CC @costrouc @dtzWill @ttuegel @vcunat

@ttuegel
Copy link
Member

ttuegel commented Apr 21, 2019

* `TARGET="NEHALEM"` or  `TARGET="HASWELL"` seems to work but this might break `openblas` on older system (> ~8-10 years old).

Would this break AMD systems?

@markuskowa
Copy link
Member Author

markuskowa commented Apr 21, 2019

I am not sure if Nehalem and Haswell would still work on all AMD systems. A choice of a fixed target would require a careful investigation, which CPUs are compatible. It is unfortunate that the Athlon target does not to work anymore (that was a conservative but safe choice).

@vcunat
Copy link
Member

vcunat commented Apr 22, 2019

I'm currently using Ryzen a lot, so if you write what to try, I can do that. EDIT: maybe the piledriver I have lying around wouldn't be too hard for me either.

@markuskowa
Copy link
Member Author

@vcunat Thanks for the help. I use a an override such as this locally to fix the problem (a full version of it can be found here https://github.com/markuskowa/NixOS-QChem/blob/master/default.nix):

    openblasCompat = let 
      obMkfCleaner = makeFlags:
        map (x: if (x == "DYNAMIC_ARCH=1") then "DYNAMIC_ARCH=0" else
          if (x == "TARGET=ATHLON") then "TARGET=NEHALEM" else x
       ) makeFlags;

      in super.openblasCompat.overrideDerivation ( oa : {
         makeFlags = obMkfCleaner oa.makeFlags;
    })

It would be great if you could test the following on Ryzen:

  1. See if the modified openblasCompat builds and passes its test suite.
  2. Build a dependency such as scalapack and see if the testsuite passes.

Both derivations have check phases that do some basic numerical test. If there are incompatible instructions in the binaries it should show in the check phase. That would be a good first test.

I will try to come up with some nix expression that could be used as a more careful test. Given that we had trouble with openblas in the past it would make sense to have some form of extended test suite that could be used for future updates also (and tested by several people on different CPU models).

@vcunat
Copy link
Member

vcunat commented Apr 22, 2019

Both passed successfully, including some test suites. (I just used the expression above as overlay for current master.)

@markuskowa
Copy link
Member Author

@vcunat Thansk a lot. That sounds like a good start.

I also went through the sources of openblas and found that there is a NO_AVX512 compile time flag. I will see if that solves the problem for now.

@markuskowa
Copy link
Member Author

openblas is built with the NO_AVX512 option, which by passes the problem.

danieldk added a commit to danieldk/nixpkgs that referenced this issue Sep 6, 2020
AVX512 support was disabled due to an issue in the AVX512 DGEMM
kernel:

NixOS#59708

However, this kernel has been replaced upstream a while ago:

https://github.com/xianyi/OpenBLAS/releases/tag/v0.3.8

And they have enabled AVX512 kernels since then.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

3 participants