Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symmetric matrix inversion giving incorrect results #2194

Closed
rnburn opened this issue Jul 23, 2019 · 27 comments
Closed

symmetric matrix inversion giving incorrect results #2194

rnburn opened this issue Jul 23, 2019 · 27 comments
Labels
Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS

Comments

@rnburn
Copy link

rnburn commented Jul 23, 2019

For the latest version of OpenBlas, calling potrf then potri is giving incorrect results for the particular example below.

When I run it with 3.6, I get

> g++ cholesky_test.cc /3rd_party/OpenBLAS-0.3.6/libopenblas.a -lpthread
> ./a.out
-0.0915173

If I run with the version installed by my package manager, I get the correct result

> g++ cholesky_test.cc -lopenblas
>  ./a.out
2.21785e-06
> dpkg -s libopenblas-dev
Package: libopenblas-dev
Status: install ok installed
Priority: optional
Section: libdevel
Installed-Size: 54162
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: same
Source: openblas
Version: 0.3.5+ds-2
Provides: libblas.so, liblapack.so
Depends: libopenblas-base (= 0.3.5+ds-2)
Breaks: libatlas-base-dev (<< 3.10.3-4~), libblas-dev (<< 3.7.1-2~), liblapack-dev (<< 3.7.1-2~)
Description: Optimized BLAS (linear algebra) library (development files)

Which is also given by atlas

> g++ cholesky_test.cc -llapack -lf77blas -lcblas -latlas
>./a.out
2.21785e-06
// problem example
#include <cassert>
#include <iostream>

const int n = 26;

extern double E[n][n];

extern "C" void dpotrf_(const char* uplo, const int* n, double* A, const int* lda, int* info);
extern "C" void dpotri_(const char* uplo, const int* n, double* A, const int* lda, int* info);

int main() {
  char uplo = 'L';
  int info;
  dpotrf_(&uplo, &n, &E[0][0], &n, &info);
  assert(info == 0);
  dpotri_(&uplo, &n, &E[0][0], &n, &info);
  assert(info == 0);
  std::cout << *(&E[0][0] + 16) << "\n"; // entry 16, 0 for column major ordering
}

















// data
double E[n][n] = {
    {144.28239738429019,      -319.19900341084332,     -1296.1145812550405,
     -7.6770040123669969e-05, -110.18491233020869,     -6.5865467083733162e-06,
     2.1799515628353019,      -1.9307211961156157,     1.7043827929802871e-05,
     3.2194279252331291e-05,  -39.651186487712707,     -0.00071684978621199024,
     -8.6241866669235208,     -7.8813352941864478e-06, -9.1397862810662911,
     16.145217777093777,      0.00021968988764172484,  -7.993955233582102e-06,
     -0.00021501001462387393, 9.2737755782662799,      -10.770918183005119,
     -21.555103616533106,     -25.50946140559132,      -20.585943317869393,
     -25.336142393442419,     -3033.5911821111858},
    {-319.19900341084332,     753.47972628605828,      3020.3228874916285,
     0.00018917409321302976,  240.87310385517344,      2.2182449161908461e-05,
     -7.6665716054256929,     11.958552047022275,      -5.543064921378486e-05,
     -8.2709030061726732e-05, 83.507300327532917,      0.0014870699777732096,
     20.590812608552817,      1.8181353278144878e-05,  22.757611779484584,
     -37.103587110257493,     -0.00054757069589520197, 1.7785001787575848e-05,
     0.00049732328898703876,  -23.0346712296897,       27.324038137627131,
     47.753807739018512,      79.061103733547029,      56.336768670027197,
     73.125635519990553,      7417.5896339988767},
    {-1296.1145812550405,     3020.3228874916285,     12507.476805394395,
     0.00080766966235834906,  1000.3984735631913,     9.6900398493091048e-05,
     -31.2383412536586,       42.323074025121159,     -0.00023740730761843157,
     -0.00045655705079795993, 385.19468767034414,     0.005679267990102212,
     69.294784475201567,      7.4578342295604285e-05, 89.789916446817415,
     -142.39025505036423,     -0.0024393396211984351, 8.1870085433869899e-05,
     0.0022175508680669135,   -96.233276550937347,    114.8691602823897,
     211.12882441268206,      355.58160033482602,     254.33203967237159,
     319.62147542109619,      30937.363836040866},
    {-7.6770040123669969e-05, 0.00018917409321302976,  0.00080766966235834906,
     1.0000000000605893,      6.0870767787293126e-05,  9.1082218247670291e-12,
     -3.0860367894532966e-06, 3.1724281255884221e-06,  -2.3169640163074985e-11,
     -3.7195946390194495e-11, 2.6577415798249472e-05,  2.4349224699962758e-10,
     3.6872375970095972e-06,  5.1156404783270122e-12,  5.6914491667378166e-06,
     -8.0878283078119414e-06, -1.8635685971638038e-10, 5.6124028735158997e-12,
     1.5579734124741342e-10,  -6.7243752779478904e-06, 8.6081836048189634e-06,
     1.4827290007742856e-05,  3.3531700224818024e-05,  2.2305165158482174e-05,
     2.9410296353478671e-05,  0.0022050054547991917},
    {-110.18491233020869,
     240.87310385517344,
     1000.3984735631913,
     6.0870767787293126e-05,
     109.09776759877555,
     0,
     0,
     0,
     -1.3172013181357207e-05,
     0,
     38.498080674535522,
     0.00050108966088901043,
     5.6496759442421283,
     6.4526967145252e-06,
     6.468730391703188,
     -10.955482995811055,
     -0.00017085814393585734,
     6.2338972746055648e-06,
     0.00016910659118233662,
     -7.488878494739116,
     8.9001850182949518,
     19.885323774973077,
     12.78088066100387,
     16.396704901927158,
     30.494755632674455,
     2363.4766308514409},
    {-6.5865467083733162e-06,
     2.2182449161908461e-05,
     9.6900398493091048e-05,
     9.1082218247670291e-12,
     0,
     1.0000000000068994,
     0,
     0,
     -5.4806369842643657e-12,
     -2.5383586606814818e-11,
     9.8955098916678016e-07,
     0,
     6.3965393389262975e-07,
     6.1368015048184276e-13,
     8.5445142288574447e-07,
     -1.25748379952674e-06,
     -3.2498738471802017e-11,
     8.8930702622906122e-13,
     2.9241434146339037e-11,
     -7.9136182813932271e-07,
     1.1542461426980547e-06,
     1.6445067080162328e-06,
     7.8140480339369592e-06,
     2.3990762256917956e-06,
     1.4500934301964703e-06,
     0.00031722579841752781},
    {2.1799515628353019,
     -7.6665716054256929,
     -31.2383412536586,
     -3.0860367894532966e-06,
     0,
     0,
     1.7561860073751649,
     -0.94879254322291628,
     1.8706261516718316e-06,
     0,
     -0.46256935706385272,
     0,
     -0.079735640961285631,
     -1.9124474012889266e-07,
     -0.26627770214539737,
     0.22392973861333448,
     1.0127772242065562e-05,
     -1.8475997199565853e-07,
     -4.5563397071926297e-06,
     0.3699250832496368,
     -0.47960598243617131,
     -0.2562436293302357,
     -3.517420923154416,
     -2.2429145327818802,
     -2.0335549952407126,
     -111.56936816351981},
    {-1.9307211961156157,
     11.958552047022275,
     42.323074025121159,
     3.1724281255884221e-06,
     0,
     0,
     -0.94879254322291628,
     24.511535264588186,
     0,
     0,
     0,
     0,
     0,
     0,
     1.3196978894666778,
     -1.3872716390437649,
     0,
     0,
     0,
     -1.5278202283180078,
     0,
     0,
     6.7048754729089763,
     0,
     0,
     139.98696555240753},
    {1.7043827929802871e-05,
     -5.543064921378486e-05,
     -0.00023740730761843157,
     -2.3169640163074985e-11,
     -1.3172013181357207e-05,
     -5.4806369842643657e-12,
     1.8706261516718316e-06,
     0,
     1.0000000000156641,
     0,
     0,
     0,
     -7.9041814114779892e-07,
     -1.7603913491680648e-12,
     -2.1116853022960015e-06,
     1.7441379518032535e-06,
     7.1711746582941168e-11,
     -1.5698765699401324e-12,
     -4.8393131868260352e-11,
     2.4447076459245345e-06,
     -2.8865553620846534e-06,
     -4.3545240404281247e-06,
     -1.9158317411230531e-05,
     -1.1116988051902741e-05,
     -1.7918754077090128e-05,
     -0.00081598920813093982},
    {3.2194279252331291e-05,
     -8.2709030061726732e-05,
     -0.00045655705079795993,
     -3.7195946390194495e-11,
     0,
     -2.5383586606814818e-11,
     0,
     0,
     0,
     1.0000000021791042,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     -1.6514757993872359e-05,
     -3.5293984485888423e-05,
     -7.4534715753270741e-05,
     0,
     0,
     -0.001167123318930719},
    {-39.651186487712707,
     83.507300327532917,
     385.19468767034414,
     2.6577415798249472e-05,
     38.498080674535522,
     9.8955098916678016e-07,
     -0.46256935706385272,
     0,
     0,
     0,
     53.158964253149705,
     0,
     1.2844184902347464,
     3.0806585037794388e-06,
     0,
     -3.6071635384237961,
     -9.5166649474833902e-05,
     4.4642983004620115e-06,
     9.1744527559364095e-05,
     -2.4828849600130369,
     5.7942858216376116,
     6.1915358969099072,
     4.3584799211025818,
     10.537888098268034,
     5.4595736722565213,
     955.47918838097769},
    {-0.00071684978621199024,
     0.0014870699777732096,
     0.005679267990102212,
     2.4349224699962758e-10,
     0.00050108966088901043,
     0,
     0,
     0,
     0,
     0,
     0,
     1.0000000056476226,
     5.1546999517194208e-05,
     2.5454204174198011e-11,
     5.2655061426916271e-05,
     -8.5155705772569246e-05,
     -6.418959793054576e-10,
     2.1078083569298925e-11,
     7.7970548556300377e-10,
     -3.7513232787114859e-05,
     2.7357589603777587e-05,
     9.7443989531998139e-05,
     0,
     1.4215543024739636e-05,
     0,
     0.010955949690545362},
    {-8.6241866669235208,
     20.590812608552817,
     69.294784475201567,
     3.6872375970095972e-06,
     5.6496759442421283,
     6.3965393389262975e-07,
     -0.079735640961285631,
     0,
     -7.9041814114779892e-07,
     0,
     1.2844184902347464,
     5.1546999517194208e-05,
     5.6494488455974405,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     164.7012374410474},
    {-7.8813352941864478e-06,
     1.8181353278144878e-05,
     7.4578342295604285e-05,
     5.1156404783270122e-12,
     6.4526967145252e-06,
     6.1368015048184276e-13,
     -1.9124474012889266e-07,
     0,
     -1.7603913491680648e-12,
     0,
     3.0806585037794388e-06,
     2.5454204174198011e-11,
     0,
     1.0000000000066867,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0.00019751672510297895},
    {-9.1397862810662911,
     22.757611779484584,
     89.789916446817415,
     5.6914491667378166e-06,
     6.468730391703188,
     8.5445142288574447e-07,
     -0.26627770214539737,
     1.3196978894666778,
     -2.1116853022960015e-06,
     0,
     0,
     5.2655061426916271e-05,
     0,
     0,
     9.296331140638955,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     220.00835018106548},
    {16.145217777093777,
     -37.103587110257493,
     -142.39025505036423,
     -8.0878283078119414e-06,
     -10.955482995811055,
     -1.25748379952674e-06,
     0.22392973861333448,
     -1.3872716390437649,
     1.7441379518032535e-06,
     0,
     -3.6071635384237961,
     -8.5155705772569246e-05,
     0,
     0,
     0,
     14.096703920607254,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     -330.39092506927017},
    {0.00021968988764172484,
     -0.00054757069589520197,
     -0.0024393396211984351,
     -1.8635685971638038e-10,
     -0.00017085814393585734,
     -3.2498738471802017e-11,
     1.0127772242065562e-05,
     0,
     7.1711746582941168e-11,
     0,
     -9.5166649474833902e-05,
     -6.418959793054576e-10,
     0,
     0,
     0,
     0,
     1.0000000080368816,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     -0.0067242333568572682},
    {-7.993955233582102e-06,
     1.7785001787575848e-05,
     8.1870085433869899e-05,
     5.6124028735158997e-12,
     6.2338972746055648e-06,
     8.8930702622906122e-13,
     -1.8475997199565853e-07,
     0,
     -1.5698765699401324e-12,
     0,
     4.4642983004620115e-06,
     2.1078083569298925e-11,
     0,
     0,
     0,
     0,
     0,
     1.0000000000066867,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0.00020444923408148574},
    {-0.00021501001462387393,
     0.00049732328898703876,
     0.0022175508680669135,
     1.5579734124741342e-10,
     0.00016910659118233662,
     2.9241434146339037e-11,
     -4.5563397071926297e-06,
     0,
     -4.8393131868260352e-11,
     0,
     9.1744527559364095e-05,
     7.7970548556300377e-10,
     0,
     0,
     0,
     0,
     0,
     0,
     1.0000000046088182,
     0,
     0,
     0,
     0,
     0,
     0,
     0.0057141463438239474},
    {9.2737755782662799,
     -23.0346712296897,
     -96.233276550937347,
     -6.7243752779478904e-06,
     -7.488878494739116,
     -7.9136182813932271e-07,
     0.3699250832496368,
     -1.5278202283180078,
     2.4447076459245345e-06,
     0,
     -2.4828849600130369,
     -3.7513232787114859e-05,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     12.119404916114496,
     0,
     0,
     0,
     0,
     0,
     -254.70466421776533},
    {-10.770918183005119,
     27.324038137627131,
     114.8691602823897,
     8.6081836048189634e-06,
     8.9001850182949518,
     1.1542461426980547e-06,
     -0.47960598243617131,
     0,
     -2.8865553620846534e-06,
     -1.6514757993872359e-05,
     5.7942858216376116,
     2.7357589603777587e-05,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     14.5173066017481,
     0,
     0,
     0,
     0,
     318.42960802877093},
    {-21.555103616533106,
     47.753807739018512,
     211.12882441268206,
     1.4827290007742856e-05,
     19.885323774973077,
     1.6445067080162328e-06,
     -0.2562436293302357,
     0,
     -4.3545240404281247e-06,
     -3.5293984485888423e-05,
     6.1915358969099072,
     9.7443989531998139e-05,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     50.732766069148802,
     0,
     0,
     0,
     548.19796101927602},
    {-25.50946140559132,
     79.061103733547029,
     355.58160033482602,
     3.3531700224818024e-05,
     12.78088066100387,
     7.8140480339369592e-06,
     -3.517420923154416,
     6.7048754729089763,
     -1.9158317411230531e-05,
     -7.4534715753270741e-05,
     4.3584799211025818,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     222.79841230067959,
     0,
     0,
     1157.6981119100133},
    {-20.585943317869393,
     56.336768670027197,
     254.33203967237159,
     2.2305165158482174e-05,
     16.396704901927158,
     2.3990762256917956e-06,
     -2.2429145327818802,
     0,
     -1.1116988051902741e-05,
     0,
     10.537888098268034,
     1.4215543024739636e-05,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     106.84233446531357,
     0,
     799.73446678156142},
    {-25.336142393442419,
     73.125635519990553,
     319.62147542109619,
     2.9410296353478671e-05,
     30.494755632674455,
     1.4500934301964703e-06,
     -2.0335549952407126,
     0,
     -1.7918754077090128e-05,
     0,
     5.4595736722565213,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     0,
     166.34344163160841,
     1033.4546961389804},
    {-3033.5911821111858,   7417.5896339988767,     30937.363836040866,
     0.0022050054547991917, 2363.4766308514409,     0.00031722579841752781,
     -111.56936816351981,   139.98696555240753,     -0.00081598920813093982,
     -0.001167123318930719, 955.47918838097769,     0.010955949690545362,
     164.7012374410474,     0.00019751672510297895, 220.00835018106548,
     -330.39092506927017,   -0.0067242333568572682, 0.00020444923408148574,
     0.0057141463438239474, -254.70466421776533,    318.42960802877093,
     548.19796101927602,    1157.6981119100133,     799.73446678156142,
     1033.4546961389804,    82723.704735195497}};
@martin-frbg
Copy link
Collaborator

What CPU are you using (SkylakeX by any chance) ? Unfortunately the relatively recent AVX512-enabled DGEMM kernel has turned out to be problematic, and partial changes in 0.3.6 seem to have actually made the problem worse.

@brada4
Copy link
Contributor

brada4 commented Jul 23, 2019

Does it make any difference if running with OPENBLAS_NUM_THREADS=1
Also could you try with atlas alone and f77blas alone, they provide identical API and what you do is take 'undefined behaviour' as reference

@rnburn rnburn closed this as completed Jul 23, 2019
@rnburn rnburn reopened this Jul 23, 2019
@rnburn
Copy link
Author

rnburn commented Jul 23, 2019

@martin-frbg This is my /proc/cpuinfo.

But I also tried compiling openblas-3.5 from source and got the correct result. Would the DGEMM kernel even come up for potrf and potri?

cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5184.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 1
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 4949.32
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 2
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 4879.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 3
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5295.45
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 4
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 4
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 4398.19
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 5
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 5
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 5
initial apicid	: 5
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5187.92
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 6
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 6
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5265.10
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 7
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 4991.17
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 8
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 8
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 8
initial apicid	: 8
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5416.35
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 9
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 9
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 9
initial apicid	: 9
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5148.90
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 10
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 10
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 10
initial apicid	: 10
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5073.30
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
stepping	: 10
cpu MHz		: 2600.000
cache size	: 9216 KB
physical id	: 11
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch kaiser fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5062.09
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

@rnburn
Copy link
Author

rnburn commented Jul 23, 2019

Also could you try with atlas alone and f77blas alone, they provide identical API and what you do is take 'undefined behaviour' as reference

I don't know what you mean by 'undefined behavior'. The linking I used is taken directly from ATLAS's docs. Also, potrf and potri are lapack functions, so they wouldn't be provided by f77blas alone.

The full LAPACK library created by merging ATLAS and netlib LAPACK requires both C and Fortran77 > interfaces, and thus that serial link line would be:
-L$(MY_BLDdir)/lib/ -llapack -lf77blas -lcblas -latlas

Given that ATLAS and openblas-3.5 both give 2.21785e-06, I think we can be pretty certain that's the right result.

@rnburn
Copy link
Author

rnburn commented Jul 23, 2019

OPENBLAS_NUM_THREADS does make a difference

> g++ t.cpp /3rd_party/OpenBLAS-0.3.6/libopenblas.a -lpthread
> OPENBLAS_NUM_THREADS=1
> ./a.out
2.21785e-06
> OPENBLAS_NUM_THREADS=""
> ./a.out
-0.0915173

@brada4
Copy link
Contributor

brada4 commented Jul 23, 2019

Your cpu is not avx512

Is the openblas return same each time?
Is it different with one thread?

If you use -lblas as only BLAS?

@rnburn
Copy link
Author

rnburn commented Jul 23, 2019

Yes, the results are consistent and given the differences with OPENBLAS_NUM_THREADS=1 , I think we can narrow this down to a bug with the multithreading code introduced in 3.6

@brada4
Copy link
Contributor

brada4 commented Jul 23, 2019

You may #undef SMP early in respective files in interface/lapack.

@martin-frbg
Copy link
Collaborator

The call graphs of both DPOTRI and DPOTRF do include DGEMM (at least in the original reference implementation - netlib.org has nice diagrams for all LAPACK functions) but obviously this cannot be the evil AVX512 DGEMM problem if you see it on Coffee Lake hardware. Now that I have a little more time to look into this, I cannot reproduce the problem on my i7-8700K (basically the desktop version of your cpu, and valgrind/helgrind also do not report any memory or multihreading code issues). Which compiler version did you use to build OpenBLAS, and what build options - if any - did you use ?

@rnburn
Copy link
Author

rnburn commented Jul 24, 2019

This is the compiler

 g++ --version
g++ (Ubuntu 8.3.0-6ubuntu1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

For building, I just ran make without any options.

@martin-frbg
Copy link
Collaborator

Thanks. I tried with GCC 7.2.1, and now 9.1.0 as well, both 0.3.6 and current develop branch consistently return 2.21785e-6 for your testcase.

@Diazonium
Copy link
Contributor

Could you try a rebuild with FCOMMON_OPT = -frecursive -fno-optimize-sibling-calls ?

@martin-frbg
Copy link
Collaborator

Hmm. If it was #2154, it should still happen with GCC 9.1 (and probably already with 7.2.1) I think ? Also the potrf and potri in OpenBLAS are rewritten in C so unlikely to hit mixed-language ABI issues ...

@rnburn
Copy link
Author

rnburn commented Jul 24, 2019

I still have the issue with FCOMMON_OPT="-frecursive -fno-optimize-sibling-calls"

@Diazonium
Copy link
Contributor

Ok, so at least we can rule that nasty issue out.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jul 25, 2019

Even if this was a compiler problem, I do not see why it would affect only the OpenBLAS build. Any chance of a difference (however small, like "harmless" code cleanup) between the testcase you posted here and what you actually use ?
From the Ubuntu build logs, apparently their 0.3.5 was built with

/usr/bin/make NO_LAPACKE=1 NO_AFFINITY=1 USE_OPENMP=0 NO_WARMUP=1 CFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/<<BUILDDIR>>/openblas-0.3.5+ds=. -fstack-protector-strong -Wformat -Werror=format-security" FFLAGS="-g -O2 -fdebug-prefix-map=/<<BUILDDIR>>/openblas-0.3.5+ds=. -fstack-protector-strong" COMMON_OPT= FCOMMON_OPT=-frecursive NUM_THREADS=64 DYNAMIC_ARCH=1 DYNAMIC_OLDER=1

I do not think the FORTIFY_SOURCE and stack-protector-strong should make a difference (and these were not used in my tests), but perhaps it is worth a try ? UPDATE: they had no effect in my tests.

@rnburn
Copy link
Author

rnburn commented Jul 25, 2019

Any chance of a difference (however small, like "harmless" code cleanup) between the testcase you posted here and what you actually use ?

No I ran it directly

Also, I built 0.3.5 from source (just running make) and got the correct answer, so it's definitely a change in 0.3.6 that's causing the issue.

@rnburn
Copy link
Author

rnburn commented Jul 25, 2019

Also, not sure that it makes a difference, but the exact environment setup is I'm running the example through the docker image ubuntu:19.04 on OS X, which I think is through a VM.

@martin-frbg
Copy link
Collaborator

Seems that would need to be a change that triggers a bug in the gcc or llibpthread provided by Ubuntu, as I cannot reproduce the problem with opensuse 15 on similar hardware. (Or might as well blame it on your OSX/docker setup - seems there are two flavors, "docker toolbox" runs Linux inside a VirtualBox VM to run docker, while Docker Desktop apparently is a native OSX application ?)
You could try current develop, or run helgrind in your environment to check for race conditions (install valgrind and run your testcase as valgrind --tool=helgrind ./a.out)

@rnburn
Copy link
Author

rnburn commented Jul 25, 2019

Seems that would need to be a change that triggers a bug in the gcc or llibpthread provided by Ubuntu, as I cannot reproduce the problem with opensuse 15 on similar hardware.

I wouldn't draw that conclusion yet. Given that 0.3.5 works with the same setup, I think the most likely explanation is that there's a bug introduced in 0.3.6.

You could try current develop, or run helgrind in your environment to check for race conditions (install valgrind and run your testcase as valgrind --tool=helgrind ./a.out)

Because it gives the same wrong result each time with OPENBLAS_NUM_THREADS="", I doubt that it's a race condition.

@martin-frbg
Copy link
Collaborator

If it was a bug in 0.3.6 I would expect to see it on my system as well - basically the desktop version of your hardware, so same BLAS microkernels, same number of threads...

@brada4
Copy link
Contributor

brada4 commented Jul 26, 2019

Does anything change if you try develop version?
Does it help to update kernel/ucode?

Does not repeat for me either (all sorts of earlier cpus)

I will not have time until mid-sep, i think Martin is up to same.

@martin-frbg
Copy link
Collaborator

I have now installed docker on my Haswell system and repeated the tests with the Ubuntu 19.04 image from hub.docker.io (actually both the stable 18.something and the "rolling" 19.04 as I made a mistake in the Dockerfile initially) - and both current develop and 0.3.6 always return 2.21785e-6 as they should.

@martin-frbg
Copy link
Collaborator

@rnburn are you using Docker Desktop or the older, presumably Virtualbox-based Docker toolbox ?
Investigation of today's issue #2244 came to the conclusion that AVX2 support in the former is either broken or incomplete.

@rnburn
Copy link
Author

rnburn commented Sep 1, 2019

I was using Docker Desktop community version 2.0.0.3

@brada4
Copy link
Contributor

brada4 commented Sep 1, 2019

Does NO_AVX2=1 build flag address numeric issues, in line with findings of #2244 ?
PS no matter the version, anything 2.x running on AVX2 CPUs suffers same, the short code is there for ages.

@martin-frbg martin-frbg added the Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS label Sep 5, 2019
@martin-frbg
Copy link
Collaborator

Added a warning to the FAQ section in the wiki as there has been no activity on the xhyve issue tracker for the past 3 years

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS
Projects
None yet
Development

No branches or pull requests

4 participants