-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
18 "FATAL ERROR" messages compiling v0.3.7 in a Dockerfile #2244
Comments
What hardware and operating system is your docker running on ? |
I'm running this on macOS Mojave 10.14.6 on a 2018 MacBook Pro. Docker is supposed to isolate programs from all of that, yes? Or are there holes in that abstraction? |
Inside the container,
|
We have another open issue (#2194) where a docker container under OSX behaves differently from |
Oh, man, another leaky abstraction to contend with! It must be really hard maintaining OpenBLAS across hardware, OS, environment, and compiler variations. Props to the team! |
Unfortunately I do not have a Mac to try and bisect this in case there was really some change in 0.3.6 responsible for this effect. The two OSX builds in our Travis CI setup do pass though... |
Try to set build variables NO_AVX2=1 and then NO_AVX=1 It could be that virtualisation between linux and osx breaks one of those (and also check /proc/cpuinfo in container if possible) EDIT: it is sometimes seen in less mature virtualizers that advanced ISA bits are not masked off in a virtual machine. Say early Windows Linux subsystem had no AVX(1) while it was in CPUID.... |
@brada4 not sure how that would explain 0.3.5 still working though ? |
Would be nice to get compiler warnings out, and check gfortran's side of ABI breakage.... It is sort of addressed in 0.3.7 |
BTW Docker is set to allot 6 of 12 hyperthreaded "CPUs" to a container. |
Looks like NO_AVX2=1 will be needed, at least I could not find any sign of AVX2 support. |
Experimental confirmation:
Thinking about @brada4's hypothesis and Docker allocating half of the hardware CPU threads, does Docker tie down a fixed group of cores or could there be faulty context switches between cores? Just brainstorming hypotheses here beyond my expertise. |
Could you check (by passing NO_AVX2=1 to OSX docker) if....
Nope, that's at lower level in kernel, OpenBLAS is userspace library,
AVX2 is new instruction set in Haswell CPU, improvement over original AVX introduced with Sandy Bridge, it is not 2 cores with AVX |
The test result is: With (Nor any other errors that I can see. There are build warnings.) |
Since we have no MAC PC, |
Some small amount is expected, probably you see them in same places in CI logs |
I'm happy do that but I'm not sure how to write a clear Issue. I'll at-reference you for more details, or alternatively you could file the Issue and at-reference me. Q. Did you know why v0.3.6 and v0.3.7 have this symptom but v0.3.5 doesn't? |
Just link my comment #2244 (comment) A1: just a wild guess suspecting virtualisation layer on first encounter A2: if you can tell apart from inside the docker container xhyve from hypothetical later fixed xhyve with AVX2 support and from linux docker which has AVX2 since inception - yes, otherwise with persistent option you lose 20-30% performance on real AVX2 CPUs, like all those made in last 5-some years |
Thanks, I added the detail missing. |
In case the bug fixes matter, update OpenBLAS to 0.3.9 inside the Docker container and in the create-pyenv instructions. I retested the problem with AVX2 instructions in Docker Desktop for Mac (OpenMathLib/OpenBLAS#2244) with the latest OpenBLAS and Docker Desktop and filed the Issue in the Docker repo this time, docker/for-mac#4576 On macOS outside of Docker, `brew install openblas` now installs 0.3.9 . We no longer need to compile it from source. Use Python 3.8.3 in the test workflows.
After a year there are no comments or fixes for xhyve Issue 171 or docker-for-mac Issue 4576. Unexpectedly, building ( Curiouser, OpenBLAS 0.3.10 built with and without We run with Summary: We can install OpenBLAS
Any suggestions how to install OpenBLAS in a way that will get consistent results cross-platform rather than at least 7 equivalence classes of results? The installation instructions needn't be the same outside Docker vs. in the Dockerfile. |
Added a warning to the FAQ section in the wiki as there has been no activity on the xhyve issue tracker for the past 3 years |
Here's the first part of my
Dockerfile
:Assuming you have the Docker Desktop installed, run
Everything's great with openblas v0.3.5.
With v0.3.6 or v0.3.7, the tests print 18 "FATAL ERROR" messages. Excerpts:
The text was updated successfully, but these errors were encountered: