Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an openblas wheel #87

Merged
merged 23 commits into from
Sep 19, 2023
Merged

Create an openblas wheel #87

merged 23 commits into from
Sep 19, 2023

Conversation

mattip
Copy link
Collaborator

@mattip mattip commented Aug 3, 2022

Fixes #86.

There is room to bikeshed the package name: openblas, openblas_libs, ...?

The content will appear at site-packages/openblas/{lib, include}. The concept of installing under sys.prefix is in discussion. If it is critical, we could move to flit rather than setuptools. We would then need an agreed upon location under sys.prefix to put the include files, shared objects, link libraries, pkgconfig scripts, and cmake modules.

In future PRs:

  • python code that is shared between projects to properly set up the build (pkgconfig) and runtime shared object location.
  • telling auditwheel and delocate to add an RPATH directive to the shared objects in this package and not repackage them into the NumPy/SciPy wheel.

@mattip
Copy link
Collaborator Author

mattip commented Aug 3, 2022

The wheels are named openblas-0.3.20-py3-none-any.whl, they should have a platform-specific tag like openblas-0.3.20-py3-none-manylinux_2_17_x86_64.whl. Is there a way to tell pip wheel` about that?

@mattip
Copy link
Collaborator Author

mattip commented Oct 3, 2022

xref pypa/auditwheel#392 which will enable telling auditwheel "we know openblas is outside the manylinux spec, but package anyway"

@mattip
Copy link
Collaborator Author

mattip commented Oct 13, 2022

This is starting to work. After building OpenBLAS, do something like (from CI)

        mkdir -p local/openblas
        tar -C local/openblas --strip-components=2 -xf libs/openblas*.tar.gz
        # do not package the static libs, they are ~55MB
        rm local/openblas/*.a
        python -m pip wheel -w dist -vv .

or replace the last line with python -m pip install -e . --no-build-isolation. Then you can do

$ python -m openblas
OpenBLAS using 'OpenBLAS 0.3.20  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=64'

And using a modified numpy (to find openblas but not add -lopenblas), one can do (from the numpy source directory)

NPY_USE_BLAS_ILP64=1 python tools/openblas_support.py 
OPENBLAS64_=/tmp/openblas NPY_USE_BLAS_ILP64=1  python runtests.py --python
>>> import numpy
<error report, with the last line "Original error was: 
    /home/matti/oss/numpy/build/testenv/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so: 
    undefined symbol: cblas_caxpy64_
>>> import openblas
>>> openblas.open_so()
>>> import numpy
<succeeds>

@mattip
Copy link
Collaborator Author

mattip commented Oct 13, 2022

Note that removing the -lopenblas means that we no longer need support from auditwheel or deallocate delocate to produce a numpy wheel, since the openblas shared object is not written into the numpy c-extensions. On the other hand, we are now susceptible to all the problems using dlopen() entails: if someone else loads similar symbols into the global namespace before we do (with openblas.open_so() which calls dlopen(so_path, RTLD_GLOBAL | RTLD_NOW)), their symbols will be used. I will investigate using dlopen(so_path, RTLD_LOCAL | RTLD_NOW) inside the NumPy c-extension startup code to see if that way the symbols override the global ones.

The other direction, of mangling RPATH, LD_PRELOAD, LD_LIBPARARY_PATH, are all less attractive. They must be done at process startup, and way before any Python code is run. I don't think importing a package could modify any of that state.

@mattip
Copy link
Collaborator Author

mattip commented Oct 13, 2022

Here is the minimal diff to NumPy in order to remove the link with -lopenblas:

diff --git a/numpy/core/setup.py b/numpy/core/setup.py
index 17dc8438e..302e92c7e 100644
--- a/numpy/core/setup.py
+++ b/numpy/core/setup.py
@@ -15,6 +15,12 @@
 from numpy.compat import npy_load_module
 from setup_common import *  # noqa: F403
 
+try:
+    import openblas
+    importable_openblas = True
+except ModuleNotFoundError:
+    importable_openblas = False
+
 # Set to True to enable relaxed strides checking. This (mostly) means
 # that `strides[dim]` is ignored if `shape[dim] == 1` when setting flags.
 NPY_RELAXED_STRIDES_CHECKING = (os.environ.get('NPY_RELAXED_STRIDES_CHECKING', "1") != "0")
@@ -849,6 +855,10 @@ def get_mathlib_info(*args):
                           ])
     else:
         extra_info = {}
+    if importable_openblas:
+        # TODO: use a better API for importable openblas
+        extra_info.pop('libraries', None)
+        extra_info.pop('runtime_library_dirs', None)
 
     #######################################################################
     #             _multiarray_umath module - multiarray part              #

Edit: This is of course only a first cut. A real PR would avoid requiring successfully finding the openblas library, or enable finding it via this openblas project. Something similar must be done with linalg

Edit2: this change can be found on the mattip/dynamic-openblas branch of my fork

@mattip
Copy link
Collaborator Author

mattip commented Aug 15, 2023

Rebased and added commits from #89, thanks @mayeut. Also added a get_pkg_config() function that can be used until meson gets something like Conan's pkg_config_generator so that the meson build can generate its own pkgconfig files from a spec.

@mattip mattip force-pushed the wheel branch 2 times, most recently from 5bd6e59 to 12b7dbd Compare August 15, 2023 17:38
@mattip mattip force-pushed the wheel branch 6 times, most recently from 68c771a to 58550fa Compare August 16, 2023 16:53
@mattip
Copy link
Collaborator Author

mattip commented Sep 13, 2023

I don't really grok what is going wrong, but meson cannot use the auditwheel-repaired shared objects to build NumPy. Somehow when building the output.exe that links to the libopenblas_python.so, the linker fails to notice the libquadmath*.so that is next to the libgfortran.so. Looking at objdump, the libopenblas_python.so has an RPATH link to libgfortran, but libgfortran does not have an RPATH link to libquadmath.

When the CI builds with the older openblas.tar.gz it does not mess with this, rather it uses the system libgfortran, so maybe the problem is meson tries to build an exe rather than a shared object, which requires resolving all the references?

Maybe a more healthy approach would be to statically link libgfortran and libquadmath into the openblas shared object in the OpenBLAS build phase (like on windows)

@rgommers
Copy link
Collaborator

I'm slightly confused about the two mentions of .exe in a problem that I think is a Linux one?

I'm downloading the wheels from the last CI run to have a look. Do you have a numpy branch to build against an installed wheel to try?

but libgfortran does not have an RPATH link to libquadmath.

Ah, that may be a problem. Can you repair it by adding an extra RPATH entry for $ORIGIN to libgfortran.so with patchelf?

@mattip
Copy link
Collaborator Author

mattip commented Sep 13, 2023

I'm slightly confused about the two mentions of .exe

That is what I saw in the logs: the result of cc.links is called output.exe in my build log.

@mattip
Copy link
Collaborator Author

mattip commented Sep 13, 2023

Can you repair it by adding an extra RPATH entry for $ORIGIN to libgfortran.so with patchelf

Maybe. But that would be a little involved. After auditwheel repair packs the shared objects into the wheel, I would have to unzip the wheel, run patchelf, and re-zip the wheel. Not the end of the world.

@rgommers
Copy link
Collaborator

I meant just trying it locally first, to see if then everything works.

@mattip mattip force-pushed the wheel branch 2 times, most recently from ec4bbf1 to 8bd388a Compare September 13, 2023 21:06
@rgommers
Copy link
Collaborator

That is what I saw in the logs: the result of cc.links is called output.exe in my build log.

Ah okay. That makes sense, I've seen it before. IIRC that's just a file name that was easier to keep static. It has nothing to do with Windows executables.

Can you repair it by adding an extra RPATH entry for $ORIGIN to libgfortran.so with patchelf?

I quickly tried this. Here's what it looks like (note the "not found" issue) after unzipping the wheel:

$ ldd scipy_openblas32/lib/libopenblas_python.so 
        linux-vdso.so.1 (0x00007ffdfa341000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007fa566513000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fa56872c000)
        libgfortran-040039e1.so.5.0.0 => /home/rgommers/Downloads/wheels/scipy_openblas32/lib/../../scipy_openblas32.libs/libgfortran-040039e1.so.5.0.0 (0x00007fa566000000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007fa565c00000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007fa568755000)
        libquadmath-96973f99.so.0.0.0 => /home/rgommers/Downloads/wheels/scipy_openblas32/lib/../../scipy_openblas32.libs/libquadmath-96973f99.so.0.0.0 (0x00007fa565800000)
        libz.so.1 => /usr/lib/libz.so.1 (0x00007fa568710000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fa5686eb000)

$ ldd scipy_openblas32.libs/libgfortran-040039e1.so.5.0.0 
        linux-vdso.so.1 (0x00007fff746b5000)
        libquadmath-96973f99.so.0.0.0 => not found
        libz.so.1 => /usr/lib/libz.so.1 (0x00007f1aefa9a000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f1aef513000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f1aef4ee000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f1aef200000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007f1aefad8000)

$ ldd scipy_openblas32.libs/libquadmath-96973f99.so.0.0.0 
        linux-vdso.so.1 (0x00007ffe09f4a000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007ffa80669000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007ffa80000000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007ffa8077a000)

Adding in an $ORIGIN RPATH entry indeed fixes it:

$ patchelf --force-rpath --set-rpath '$ORIGIN' scipy_openblas32.libs/libgfortran-040039e1.so.5.0.0 
$ ldd scipy_openblas32.libs/libgfortran-040039e1.so.5.0.0         linux-vdso.so.1 (0x00007ffed55fe000)
        libquadmath-96973f99.so.0.0.0 => /home/rgommers/Downloads/wheels/scipy_openblas32.libs/libquadmath-96973f99.so.0.0.0 (0x00007f9683000000)
        libz.so.1 => /usr/lib/libz.so.1 (0x00007f96833c4000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f96832d7000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f96832b2000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f9682c00000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007f9683a1c000)

So I think that may be an auditwheel repair bug. If it vendors two shared libraries and it does detect that one depends on the other (which it does, because the version mangling is correct), it should automatically add the $ORIGIN entry.

@rgommers
Copy link
Collaborator

Binary size of libgfortran and libquadmath is small, and static linking may avoid a bunch of issues here. So that seem like a good way to go to me.

@rgommers
Copy link
Collaborator

You have to have all the pieces of the puzzle for that though. I don't have a libquadmath.a on my machine, and OpenMathLib/OpenBLAS#2759 (comment) suggests it may be a little untested.

@mattip
Copy link
Collaborator Author

mattip commented Sep 13, 2023

Thanks. I will play with this locally. It seems complicated to get OpenBLAS to link statically when making a shared object, I get an error (note the -static -shared which works with mingw, but that build has some other complicated additional steps).

cc -fvisibility=protected -O2 -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DBUFFERSIZE=20 -DMAX_CPU_NUMBER=64 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.23.dev\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I. -O2 -DSMALL_MATRIX_OPT -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DBUFFERSIZE=20 -DMAX_CPU_NUMBER=64 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.23.dev\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -static -shared-libgcc -shared -o ../libopenblasp-r0.3.23.dev.so -Wl,--whole-archive ../libopenblasp-r0.3.23.dev.a -Wl,--no-whole-archive -Wl,-soname,libopenblas.so.0 -lm -lpthread -lgfortran -lm -lpthread -lgfortran

/opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld: /opt/rh/devtoolset-10/root/usr/lib/gcc/x86_64-redhat-linux/10/crtbeginT.o: relocation R_X86_64_32 against hidden symbol __TMC_END__ can not be used when making a shared object

collect2: error: ld returned 1 exit status

@mattip
Copy link
Collaborator Author

mattip commented Sep 13, 2023

fwiw, the command to build OpenBLAS in the manylinux2014 docker is

docker run --rm -e BUILD_PREFIX=/usr/local -e PLAT=x86_64 -e INTERFACE64=1 -e NIGHTLY=0 -e PYTHON_VERSION=3.9 -e MB_ML_VER=2014 -e MB_ML_LIBC=manylinux -v $PWD:/io quay.io/pypa/manylinux2014_x86_64 /io/travis-ci/docker_build_wrap.sh

and there is a libquadmath.a inside the gcc10-installation of the docker image.

@mattip
Copy link
Collaborator Author

mattip commented Sep 14, 2023

This seems to work now (in a numpy checkout, using the last commit here):

$ python3.11 -m venv /tmp/test
$ source /tmp/test/bin/activate
$ pip install -r test_requirements.txt -r build_requirements.txt
$ pip install <path-to-wheel>/scipy_openblas64-0.3.23.293-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
$ python -c "import scipy_openblas64 as openblas; print(openblas.get_pkg_config())" > /tmp/openblas.pc
$ python -c "import scipy_openblas64 as openblas; openblas.write__distributor_init('numpy')"
$ PKG_CONFIG_PATH=/tmp spin build -- -Dblas-symbol-suffix=64_
$ PKG_CONFIG_PATH=/tmp spin test

... which promtly segfaults. Investigating.

@mattip mattip force-pushed the wheel branch 2 times, most recently from be52cad to d6818bb Compare September 14, 2023 10:54
@mattip
Copy link
Collaborator Author

mattip commented Sep 19, 2023

I think this is ready, it seems to work when properly setting the meson build options. I have uploaded wheels to PyPI. I will self-merge and then submit a PR to numpy/numpy to use the wheels.

@mattip mattip merged commit b9a655f into MacPython:main Sep 19, 2023
@rgommers
Copy link
Collaborator

Awesome, nice work @mattip! I'm looking forward to using this:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a proper wheel instead of a tarball
3 participants