Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed #3081

Open
coreyjadams opened this issue Jul 7, 2021 · 21 comments
Open

[BUG] Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed #3081

coreyjadams opened this issue Jul 7, 2021 · 21 comments

Comments

@coreyjadams
Copy link

Hi,

I've got a package that uses pybind11 (it's awesome, by the way), and had a few users report the following crash. I've been able to reproduce it myself as well. I've asked on the gitter site and had a good discussion with @quantotto but ultimately we came only down to speculation.

I've reduced the issue to a minimum reproducer, so hopefully this is possible to debug. It seems to be a somewhat hidden issue, doesn't appear in every version of python or compiler.

Issue description

When importing a package built with pybind11, the python libraries fail to load with the following error:

Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed
Python runtime state: unknown

Abort trap: 6

This only seems to appear when using conda's python on Mac OS. I haven't reproduced it elsewhere. Suspect that python is built with clang 10 from conda, while the python package in question is built with clang11, and some incompatibility arises.

Reproducible example code

I am sorry I can not give you a more simple example. I've stripped it down as far as I think I can and still reproduce this.

See the repository here: larcv3-pybind11-example
This uses scikit-build to call cmake and build a package including pybind11-generated python bindings.

Here's a list of instructions to reproduce this:

bash Miniconda3-latest-MacOSX-x86_64.sh #install a fresh conda
source miniconda3/bin/activate # activate it
conda install cmake # install cmake
pip install scikit-build # install scikit-build
git clone https://github.com/coreyjadams/larcv3-pybind11-example.git # clone the example
cd larcv3-pybind11-example/ 
git submodule update --init # clone pybind11 as a submodule
python setup.py build # compile
python setup.py install #install

Then, in a python interpreter you can do:

>>> from larcv import pylarcv
Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed
Python runtime state: unknown

Abort trap: 6

This also appears to be related to this github issue: https://stackoverflow.com/questions/66026520/fatal-python-error-pymutex-lock-pyruntime-ceval-gil-mutex-failed

@melMass
Copy link

melMass commented Sep 5, 2021

Can you try otool -L larcv3.dylib and report?
I had the same issue when linking against the wrong Python Framework library

@Jean1995
Copy link

Jean1995 commented Oct 4, 2021

Has there been any progress on this issue? I've been experiencing the same bug for our software when people tried to install it in MacOS using anaconda (i.e. pip install proposal and import proposal)

@cqc-alec
Copy link

cqc-alec commented Oct 4, 2021

I have also experienced this, building a pybind11 project on MacOS 11.6 with M1 (arm64) architecture and python built locally with pyenv (so no conda or anaconda). It's completely reproducible, but I need to try to reduce it to a minimal example...

@melMass
Copy link

melMass commented Oct 4, 2021

I have also experienced this, building a pybind11 project on MacOS 11.6 with M1 (arm64) architecture and python built locally with pyenv (so no conda or anaconda). It's completely reproducible, but I need to try to reduce it to a minimal example...

What does otool -L /path/to/lib returns?

I had the issue that in some instance the python exe used for building was hard linked on Mac!

@coreyjadams
Copy link
Author

Thanks @melMass for the reminder. Using the library I posted above which is a reproducer, I get this:

This one is the python bindings:

$ otool -L /Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/pylarcv.cpython-38-darwin.so
/Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/pylarcv.cpython-38-darwin.so:
	@rpath/pylarcv.cpython-38-darwin.so (compatibility version 0.0.0, current version 0.0.0)
	@rpath/liblarcv3.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libpython3.8.dylib (compatibility version 3.8.0, current version 3.8.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
(base)

And, this is the base C++ library that is getting bound to python:

$ otool -L /Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/lib/liblarcv3.dylib
/Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/lib/liblarcv3.dylib:
	@rpath/liblarcv3.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libpython3.8.dylib (compatibility version 3.8.0, current version 3.8.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
(base)

@cqc-alec
Copy link

cqc-alec commented Oct 4, 2021

In my case:

alec@Mac-mini pytket % otool -L pytket/_tket/circuit.cpython-38-darwin.so
pytket/_tket/circuit.cpython-38-darwin.so:
	@loader_path/libtket.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 905.6.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)

and

alec@Mac-mini pytket % otool -L pytket/_tket/libtket.dylib               
pytket/_tket/libtket.dylib:
	@loader_path/libtket.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 905.6.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)

@cqc-alec
Copy link

cqc-alec commented Oct 5, 2021

So I have produced a very minimal example to reproduce this issue: https://github.com/cqc-alec/pybind11-3081
The C++ and binder code is utterly trivial.
The build commands are in the Makefile, which includes a hard-coded path to the pybind11 2.7.1 headers installed with conan on a Mac (M1).
The output of make test is:

/Library/Developer/CommandLineTools/usr/bin/c++ -I. -stdlib=libc++ -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -fPIC -std=c++2a -o A.cpp.o -c A.cpp
/Library/Developer/CommandLineTools/usr/bin/c++ -stdlib=libc++ -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -dynamiclib -Wl,-headerpad_max_install_names -o libA.dylib -install_name @loader_path/libA.dylib A.cpp.o
/Library/Developer/CommandLineTools/usr/bin/c++ -I. -isystem /Users/alec/.conan/data/pybind11/2.7.1/_/_/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9/include -isystem /Users/alec/.pyenv/versions/3.8.11/include/python3.8 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -fPIC -fvisibility=hidden -std=c++2a -MD -MT binder.cpp.o -MF binder.cpp.o.d -o binder.cpp.o -c binder.cpp
/Library/Developer/CommandLineTools/usr/bin/c++ -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -bundle -Wl,-headerpad_max_install_names -Xlinker -undefined -Xlinker dynamic_lookup -o A.cpython-38-darwin.so binder.cpp.o -L. -lA  /Users/alec/.pyenv/versions/3.8.11/lib/libpython3.8.a
/Library/Developer/CommandLineTools/usr/bin/strip -x A.cpython-38-darwin.so
/Users/alec/.pyenv/versions/3.8.11/bin/python -c "from A import A"
Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed
Python runtime state: unknown

make: *** [test] Abort trap: 6

(In order to construct this example I extracted these commands from a build system that uses conan and cmake under the hood.)

@cqc-alec
Copy link

cqc-alec commented Oct 5, 2021

I see that the problem is caused by the hard linkage with libpython3.8.a. If I omit that, it works! So this is looking very much not like a problem with pybind11, but perhaps with either conan or cmake.

@coreyjadams
Copy link
Author

Hey @cqc-alec thanks for digging into this too. Interesting result - though, I tried in my reproducer and it's not so trivial to remove the link to python: I have direct calls to pybind11 in my normal code (equivalent to A.hpp and A.cpp) and this has to link to python.

I note that your libraries aren't directly linked to python above either, is that deliberate?

Overall, I'm very confused by this crash.

@cqc-alec
Copy link

cqc-alec commented Oct 6, 2021

@coreyjadams I guess my usage is different, in that my core C++ code knows nothing about python or pybind11.
Whatever the actual cause of this crash, I believe my real problem is conan-io/conan-center-index#6605 : the issue arose when I updated to the latest pybind11 conan package, which has the misfeature that it always links against the full list of targets -- including pybind11:embed which could explain the presence of this linkage.

@melMass
Copy link

melMass commented Oct 6, 2021

@coreyjadams I guess my usage is different, in that my core C++ code knows nothing about python or pybind11. Whatever the actual cause of this crash, I believe my real problem is conan-io/conan-center-index#6605 : the issue arose when I updated to the latest pybind11 conan package, which has the misfeature that it always links against the full list of targets -- including pybind11:embed which could explain the presence of this linkage.

I had the exact same setup when this error occured for me.
I'll try to explain how I fixed it:

So I'm, like you, using conan to package all of my dependencies.
I'm also using a python virtual env (using poetry) to match the py version I want to target.

Using the classic find_package(pybind11 REQUIRED) & then pybind11_add_module(XX MODULE $SRCS)
lead to python being hardlinked causing the PyMutex_Lock.

To solve this and I'm not sure it's the best way but it works, was to add a custom CMake module:

FindPythonPyEnv.cmake
# Find informations about the current python environment.
# by melMass
#
# Finds the following:
#
# - PYTHON_EXECUTABLE
# - PYTHON_INCLUDE_DIR
# - PYTHON_LIBRARY
# - PYTHON_SITE
# - PYTHON_NUMPY_INCLUDE_DIR
#
# - PYTHONLIBS_VERSION_STRING (The full version id. ie "3.7.4")
# - PYTHON_VERSION_MAJOR
# - PYTHON_VERSION_MINOR
# - PYTHON_VERSION_PATCH
#
#

function(debug_message messages)
  # message(STATUS "")
  message(STATUS "🐍 ${messages}")
  message(STATUS "\n")
endfunction()

if (NOT DEFINED PYTHON_EXECUTABLE)
  execute_process(
    COMMAND which python
    OUTPUT_VARIABLE PYTHON_EXECUTABLE OUTPUT_STRIP_TRAILING_WHITESPACE
  )
endif()

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; from distutils.sysconfig import get_python_inc; print(get_python_inc())"
  OUTPUT_VARIABLE PYTHON_INCLUDE_DIR OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

if (NOT EXISTS ${PYTHON_INCLUDE_DIR})
  message(FATAL "Python include directory not found.")
endif()

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; import os, numpy.distutils; print(os.pathsep.join(numpy.distutils.misc_util.get_numpy_include_dirs()))"
  OUTPUT_VARIABLE PYTHON_NUMPY_INCLUDE_DIR OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; import distutils.sysconfig as sysconfig; print('-L' + sysconfig.get_config_var('LIBDIR') + '/' + sysconfig.get_config_var('LDLIBRARY'))"
  OUTPUT_VARIABLE PYTHON_LIBRARY OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; import platform; print(platform.python_version())"
  OUTPUT_VARIABLE PYTHONLIBS_VERSION_STRING OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; from distutils.sysconfig import get_python_lib; print(get_python_lib())"
  OUTPUT_VARIABLE PYTHON_SITE OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

set(PYTHON_VIRTUAL_ENV $ENV{VIRTUAL_ENV})
string(REPLACE "." ";" _VERSION_LIST ${PYTHONLIBS_VERSION_STRING})

list(GET _VERSION_LIST 0 PYTHON_VERSION_MAJOR)
list(GET _VERSION_LIST 1 PYTHON_VERSION_MINOR)
list(GET _VERSION_LIST 2 PYTHON_VERSION_PATCH)



debug_message("Found Python ${PYTHON_VERSION_MAJOR} (${PYTHONLIBS_VERSION_STRING})")
debug_message("PYTHON_EXECUTABLE: ${PYTHON_EXECUTABLE}")
debug_message("PYTHON_INCLUDE_DIR: ${PYTHON_INCLUDE_DIR}")
debug_message("PYTHON_LIBRARY: ${PYTHON_LIBRARY}")
debug_message("PYTHON_NUMPY_INCLUDE_DIR: ${PYTHON_NUMPY_INCLUDE_DIR}")

Let's say you put this CMake module in a cmake folder at the CMAKE_SOURCE_DIR root, this is how you make CMake aware of it:

list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")

To solve the issue you need to include it before using find_package(pybind11 REQUIRED), like so:
image

AFAIK this is only happening on macOS
I hope it will help

@melMass
Copy link

melMass commented Oct 6, 2021

To be more complete here is how I run the build:

poetry run sh build_osx.sh path/to/installdir

where build_osx.sh is

mkdir -p "_build"
cd "_build"
conan install .. -s build_type=Release --build=missing 
cmake .. -G "Ninja" -DCMAKE_INSTALL_PREFIX="$1" -DCMAKE_BUILD_TYPE=Release
cmake --build . --config release
cmake --install .

@cqc-alec
Copy link

cqc-alec commented Oct 6, 2021

Thank you @melMass , I will try this!

jcarpent added a commit to wxmerkt/pinocchio that referenced this issue Oct 30, 2021
jcarpent added a commit to wxmerkt/pinocchio that referenced this issue Oct 30, 2021
@prncoprs
Copy link

prncoprs commented Nov 10, 2021

Same issue, so I change the python3 from conda to normal one in '/usr/local/bin/python3', then fixed it. My laptop is MacOS with Intel chip.

@HosikChae
Copy link

HosikChae commented Mar 15, 2022

Same Issue on an Intel mac (10.15.7) with pybind11==2.9.1.
I tried to add linker options, -undefined dynamic_lookup as this article and the solution commit to this issue thread suggested, but it didn't work.

Just migrating to python3.9 solved the issue.

@RodenLuo
Copy link

RodenLuo commented Sep 7, 2022

The minimal example given by @cqc-alec is fantastic: https://github.com/cqc-alec/pybind11-3081. I turned it into a cmake one. I changed the module's name to A_core for clarity. The contents of the new files (binder.cpp , CMakeLists.txt, test.py) are displayed by cat. Remember also to put the pybind11 folder.

Click me to see the details
~/demo ❯ ls                
A.cpp          CMakeLists.txt pybind11
A.hpp          binder.cpp     test.py
~/demo ❯ cat binder.cpp 
#include <pybind11/pybind11.h>
#include "A.hpp"

PYBIND11_MODULE(A_core, m) {
  pybind11::class_<A>(m, "A", "An A");
}
~/demo ❯ cat CMakeLists.txt
cmake_minimum_required(VERSION 3.4...3.18)
project(pybindtest)
set(CMAKE_BUILD_TYPE Debug)
add_subdirectory(pybind11)
pybind11_add_module(A_core binder.cpp)

add_library (libA A.cpp A.hpp)
TARGET_LINK_LIBRARIES(libA ${PYTHON_LIBRARIES})

target_link_libraries (A_core PRIVATE libA)

SET_TARGET_PROPERTIES(A_core
        PROPERTIES
                SUFFIX ".so"
)                                                                                
~/demo ❯ cat test.py       
from build import A_core                                                         
~/demo ❯ mkdir build       
~/demo ❯ cd build          
~/d/build ❯ cmake ..       
-- The C compiler identification is AppleClang 13.0.0.13000029
-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- pybind11 v2.11.0 dev1
-- Found PythonInterp: /usr/local/anaconda3/envs/pybind/bin/python (found suitable version "3.10.4", minimum required is "3.6") 
-- Found PythonLibs: /usr/local/anaconda3/envs/pybind/lib/libpython3.10.dylib
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Performing Test HAS_FLTO_THIN
-- Performing Test HAS_FLTO_THIN - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luod/bio_tools/oxDNA/oxpy_test/minimal_bug/demo/build
~/d/build ❯ make           
[ 25%] Building CXX object CMakeFiles/libA.dir/A.cpp.o
[ 50%] Linking CXX static library liblibA.a
[ 50%] Built target libA
[ 75%] Building CXX object CMakeFiles/A_core.dir/binder.cpp.o
[100%] Linking CXX shared module A_core.so
[100%] Built target A_core
~/d/build ❯ cd ..        
~/demo ❯ python test.py    
[1]    54424 segmentation fault  python test.py

One can debug this in VS code using vscode-lldb with the following launch.json.

Click me to see the `launch.json` (remember to `Change the python path`)
{
            "type": "lldb",
            "request": "launch",
            "name": "LLDB Python test.py",
            "program": "/usr/local/anaconda3/envs/pybind/bin/python", // <Change the python path>
            "args": [
                "${file}"
            ],
            "cwd": "${workspaceFolder}",
            "stopOnEntry": true,
            "env": {
                // "PYTHONPATH": "/Users/luod/bio_tools/oxDNA/build/oxpy_test/python" // set PYTHONPATH if necessary 
            }
        },

In the screenshot below, I highlighted the call stack, the source code that raises this error, and the source code path.

image

The source code section is this:

// Ensure that the GIL is held since we will need to make Python calls.
// Cannot use py::gil_scoped_acquire here since that constructor calls get_internals.
struct gil_scoped_acquire_local {
gil_scoped_acquire_local() : state(PyGILState_Ensure()) {}
~gil_scoped_acquire_local() { PyGILState_Release(state); }
const PyGILState_STATE state;
} gil;

The full traceback towards the above code is:

From the user's App (the minimal example):

// This line in binder.cpp
PYBIND11_MODULE(A_core, m) {

PYBIND11_ENSURE_INTERNALS_READY \

#define PYBIND11_ENSURE_INTERNALS_READY pybind11::detail::get_internals();

Solution for this minimal example

The magic in this minimal example is that you can safely remove TARGET_LINK_LIBRARIES(libA ${PYTHON_LIBRARIES})
from the CMakeLists.txt and everything works.

Solution for generic user Apps

Not really sure at the moment. Most likely, the user needs to link to ${PYTHON_LIBRARIES} for their custom c++ lib.

@RodenLuo
Copy link

RodenLuo commented Sep 7, 2022

Interestingly enough, it seems in my specific case there in oxDNA repo issue #31, I can safely remove the linking (see below). Not sure why it's not causing any linking errors. Not sure if this applies to all other generic cases.

Removing the linking

before

TARGET_LINK_LIBRARIES(_oxpy_lib ${PYTHON_LIBRARIES} common)

after

TARGET_LINK_LIBRARIES(_oxpy_lib common)

@SimonHeybrock
Copy link

@RodenLuo I also experienced that I can remove some of the linking. I am using pybind11 via conan, which links to everything by default (see conan-io/conan-center-index#6605), and experimented with this modified recipe (based on some suggestions found in the linked conan issue): https://github.com/scipp/scipp/pull/2792/files.

I am not certain yet that this is correct, but it seemed to pass most of the relevant part of our CI/builds.

@planetmarshall
Copy link

planetmarshall commented Oct 3, 2022

@RodenLuo I also experienced that I can remove some of the linking. I am using pybind11 via conan, which links to everything by default

The conan targets don't really link to anything, as there are no binaries to link to (pybind11 being a header only library). They only supply the path to the pybind11 headers.

It's pybind's CMake scripts that provide the logic to link in the Python library (or not). It shouldn't be linking in the Python library at all for a module, but it clearly does so.

For example, from Pybind11 CMake helpers, this causes libpython to be linked in:

# pybind11 method:
pybind11_add_module(MyModule1 src1.cpp)

Whereas this does not:

# Python method:
Python_add_library(MyModule2 src2.cpp)
target_link_libraries(MyModule2 pybind11::headers)

I'm not convinced this is anything to do with conan, but I have some more investigation to do. A draft PR for the conan recipe is available at conan-io/conan-center-index#13283

@0x6e
Copy link

0x6e commented Oct 12, 2022

I recently experienced this issue too. My issue was caused by linking against different python libraries. E.g. my application code is using find_package(Python3 ...) whilst the library using pybind11 was using find_package(Python ...). These found different Python libraries.

This discussion provides an example of how to force the different Python modules to find the same version: https://discourse.cmake.org/t/feature-request-setting-find-package-versions-via-env-cmake-variables/4661/4

My specific issue was caused by setting Python3_ROOT_DIR but not Python_ROOT_DIR. Setting both resolved my issue.

@technic
Copy link

technic commented Jul 1, 2024

The issue is open for 3 years and no progress. When conan can not generate correct cmake file, please at list do not remove one shipped with pybind11. Just give up on conan cmake generator, it is broken and boiler plate. CMakes by libs authors are better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests