Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with Python #97

Closed
antoine-levitt opened this issue Dec 10, 2021 · 25 comments · Fixed by #104
Closed

Segfault with Python #97

antoine-levitt opened this issue Dec 10, 2021 · 25 comments · Fixed by #104

Comments

@antoine-levitt
Copy link

I've run into similar issues in the past, feel free to close if that's known. When PyCalling into a package (pyscf), it segfaults (with nothing more informative, presumably when pyscf attempts to do linear algebra). I can provide more information if needed, but really it seems that it's just using MKL; using PyCall; do_something_with_linalg_from_python().

@antoine-levitt
Copy link
Author

antoine-levitt commented Dec 13, 2021

I was able to get a stacktrace:

signal (11): Segmentation fault
in expression starting at /home/antoine/Dropbox/recherche/2021-06-GTO/test.jl:19
mkl_lapack_ps_avx2_xdlange at /home/antoine/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_avx2.so.1 (unknown line)
mkl_lapack_dlange at /home/antoine/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_intel_thread.so.1 (unknown line)
dlange_ at /home/antoine/.julia/artifacts/72d4adc3ef9236a92f4fefeb0291cb6e8aaae2d7/lib/libmkl_intel_ilp64.so.1 (unknown line)
unknown function (ip: 0x7fd23b1f6d79)
unknown function (ip: 0x7fd23b149b97)
_PyObject_MakeTpCall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e7347df2)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalCodeWithName at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyFunction_Vectorcall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e7347d6c)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalCodeWithName at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyFunction_Vectorcall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e7347d6c)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e735306a)
unknown function (ip: 0x7fd1e7347d6c)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalCodeWithName at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyFunction_Vectorcall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e7347d6c)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalCodeWithName at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyFunction_Vectorcall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e7347d6c)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalCodeWithName at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyFunction_Vectorcall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e7347d6c)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalCodeWithName at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyFunction_Vectorcall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
PyVectorcall_Call at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalCodeWithName at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyFunction_Vectorcall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e757be0a)
PyVectorcall_Call at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalFrameDefault at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyEval_EvalCodeWithName at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
_PyFunction_Vectorcall at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
unknown function (ip: 0x7fd1e757be0a)
PyVectorcall_Call at /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (unknown line)
macro expansion at /home/antoine/.julia/packages/PyCall/3fwVL/src/exception.jl:95 [inlined]
#107 at /home/antoine/.julia/packages/PyCall/3fwVL/src/pyfncall.jl:43 [inlined]
disable_sigint at ./c.jl:458 [inlined]
__pycall! at /home/antoine/.julia/packages/PyCall/3fwVL/src/pyfncall.jl:42 [inlined]
_pycall! at /home/antoine/.julia/packages/PyCall/3fwVL/src/pyfncall.jl:29
_pycall! at /home/antoine/.julia/packages/PyCall/3fwVL/src/pyfncall.jl:11 [inlined]
#_#114 at /home/antoine/.julia/packages/PyCall/3fwVL/src/pyfncall.jl:86 [inlined]
PyObject at /home/antoine/.julia/packages/PyCall/3fwVL/src/pyfncall.jl:86
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:166 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:587
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
include_string at ./loading.jl:1196
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
_include at ./loading.jl:1253
include at ./Base.jl:418
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
exec_options at ./client.jl:292
_start at ./client.jl:495
jfptr__start_43127.clone_1 at /home/antoine/julia/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at /buildworker/worker/package_linux64/build/cli/loader_exe.c:42
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/antoine/julia/bin/julia (unknown line)
Allocations: 646536 (Pool: 646118; Big: 418); GC: 1
Segmentation fault (core dumped)

Reproducer: install pyscf in python and do

using MKL
using PyCall

pyscf = pyimport("pyscf")
basis_label = "ccpvtz"
elems = ("H", "O")
atoms = "O 0 0 0; H 0 -2.757 2.587; H 0 2.757 2.587" # H2O

method = pyscf.scf.RHF
mol = pyscf.gto.M(atom = atoms, basis = basis_label)
rhf = method(mol)
rhf.kernel()

I've verified this under julia 1.7 (linux), with both my system-provided python and julia's one

@ViralBShah
Copy link
Contributor

@staticfloat I thought these issues would be behind us with LBT.

@jishnub
Copy link
Member

jishnub commented Dec 31, 2021

I encounter something similar using PyPlot. Loading MKL before PyPlot leads to segfaults, while loading MKL later works.

This works:

julia> using PyPlot
         
julia> using MKL

julia> plot(3:4)
1-element Vector{PyCall.PyObject}:
 PyObject <matplotlib.lines.Line2D object at 0x7f71beacf430>

This doesn't:

julia> using MKL

julia> using PyPlot

julia> plot(3:4)

signal (11): Segmentation fault
in expression starting at none:0

This is on

julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i3-5005U CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_EDITOR = vim

and MKL v0.4.2

@antoine-levitt
Copy link
Author

This looks possibly different, my segfault occurs no matter what order the packages are loaded

@ViralBShah
Copy link
Contributor

@staticfloat @giordano Isn't this the usual problem of ILP64 MKL from Julia and LP64 from Python clashing? I believe the next MKL release that allows the 64_ suffixes would solve this.

@ViralBShah
Copy link
Contributor

We have worked with Intel MKL team and the fix has landed. This should hopefully give us what we need to fix all these issues.

JuliaLinearAlgebra/libblastrampoline#52

@giordano
Copy link
Contributor

giordano commented Jan 1, 2022

I've been trying to run using MKL_jll; using PyPlot; plot(3:4) multiple times, never got a crash. I wonder if the segfaults have anything to do with MKL.jl? But I think on v1.7 this package doesn't do much apart from forwarding the BLAS calls with lbt?

@antoine-levitt
Copy link
Author

I can confirm that using MKL_jll (and not using MKL) appears to fix both the MKL+PyPlot as well as the original bug report in this issue.

@ViralBShah
Copy link
Contributor

Progress: JuliaLang/julia#43877

Next step would be to update MKL.jl as discussed in JuliaLinearAlgebra/libblastrampoline#54 (comment)

@ViralBShah
Copy link
Contributor

The PyPlot example no longer crashes, irrespective of whether I now load MKL.jl before or after. If #104 is good and merges, this will conclude an extremely long project of working with Intel to bring 64_ suffixes to MKL for ILP64, and then rolling it all out in the Julia ecosystem all the way, enabled by all the work on LBT that @staticfloat and @giordano have done.

@ViralBShah
Copy link
Contributor

This also paves the way for us to be able to link Ygg binaries to LBT - both ILP64 and LP64 BLAS - and have the ability to use MKL across the ecosystem.

@jishnub
Copy link
Member

jishnub commented Jan 31, 2022

I seem to still encounter the PyPlot crash, am I using the correct version?

(@v1.8) pkg> st -m MKL
Status `~/.julia/environments/v1.8/Manifest.toml`
  [33e6dc65] MKL v0.4.4 `https://github.com/JuliaLinearAlgebra/MKL.jl#master`

(@v1.8) pkg> st -m PyPlot
Status `~/.julia/environments/v1.8/Manifest.toml`
  [d330b81b] PyPlot v2.10.0

julia> using MKL; using PyPlot; plot(3:4);


signal (11): Segmentation fault
in expression starting at none:0
[1]    29598 segmentation fault (core dumped)  julia-latest

@antoine-levitt
Copy link
Author

I think @ViralBShah was refering to ongoing work in PRs, not yet landed

@ViralBShah
Copy link
Contributor

ViralBShah commented Jan 31, 2022

That's right, #104 fixes it. Would be great if you can try it out. There was one small issue we found that needs a workaround (JuliaLinearAlgebra/libblastrampoline#56), and after that we can merge it all. Quite hopeful that it will happen by 1.8.

@jishnub
Copy link
Member

jishnub commented Feb 3, 2022

Thanks, I can confirm that the pyplot issue is fixed using that branch. Amazing work! Hoping that this makes it to 1.8

@ViralBShah
Copy link
Contributor

We expect it will make it to 1.8.

@antoine-levitt
Copy link
Author

FWIW, I still get segfaults with julia master and MKL.jl 0.5. Reproducer is the same as the second post above.

@ViralBShah
Copy link
Contributor

ViralBShah commented Mar 1, 2022

We haven't been able to merge my PR yet, because Intel gave us ILP64 mangling for fortran names but not CBLAS names, and we need CBLAS for dot. @staticfloat handled this in LBT 5 - but #104 is failing with LBT 5. I hope it will get fixed by the time 1.8 releases. Also, Intel is working on adding CBLAS name mangling for the next release.

@antoine-levitt
Copy link
Author

@ViralBShah sorry I didn't test at the time of your PR, but the MWE at #97 (comment) is still failing with julia 1.8.2 and MKL 0.5.0

@jishnub
Copy link
Member

jishnub commented Oct 5, 2022

Could you test against master (MKL 0.6.0)?

@antoine-levitt
Copy link
Author

Oh sorry I missed that. Awesome, it works. Can you guys tag a new release?

@antoine-levitt
Copy link
Author

Bump?

@ViralBShah
Copy link
Contributor

What needs to be done here? Didn't 0.6 work?

@jishnub
Copy link
Member

jishnub commented Dec 13, 2022

I think the suggestion is that a release be tagged

@ViralBShah
Copy link
Contributor

I thought I tagged v0.6, but Tagbot seems to have errored out and hence no tags or release notes in this repo. See the recent commits.

JuliaRegistries/General#70260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants