Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault if reading file while wrinting #862

Closed
dschwoerer opened this issue Nov 19, 2018 · 8 comments
Closed

Segfault if reading file while wrinting #862

dschwoerer opened this issue Nov 19, 2018 · 8 comments

Comments

@dschwoerer
Copy link

I am writing a file in one process. If I try to read it in a second thread I get the attached segmentation fault.
I am trying to read a file, while it is being written on a different node.
I could not reproduce this on a single machine - so maybe the locking mechanism fails if different computer are involved?

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6d216c2 in __memmove_ssse3_back () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 glibc-2.17-106.el7_2.8.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-12.el7_2.x86_64 libcom_err-1.42.9-7.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 openssl-libs-1.0.1e-51.el7_2.7.x86_64 pcre-8.32-15.el7_2.1.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64
(gdb) bt
#0  0x00007ffff6d216c2 in __memmove_ssse3_back () from /lib64/libc.so.6
#1  0x00007fffe4cd226a in nc4_get_vara () from $HOME/.local/lib/python3.5/site-packages/netCDF4/.libs/libnetcdf-556549db.so.11.0.4
#2  0x00007fffe4cdff97 in ?? () from $HOME/.local/lib/python3.5/site-packages/netCDF4/.libs/libnetcdf-556549db.so.11.0.4
#3  0x00007fffe4ce002d in NC4_get_vara () from $HOME/.local/lib/python3.5/site-packages/netCDF4/.libs/libnetcdf-556549db.so.11.0.4
#4  0x00007fffe4c4e45c in NC_get_vara () from $HOME/.local/lib/python3.5/site-packages/netCDF4/.libs/libnetcdf-556549db.so.11.0.4
#5  0x00007fffe4c4f52c in nc_get_vara () from $HOME/.local/lib/python3.5/site-packages/netCDF4/.libs/libnetcdf-556549db.so.11.0.4
#6  0x00007fffe80c9391 in ?? () from $HOME/.local/lib/python3.5/site-packages/netCDF4/_netCDF4.cpython-35m-x86_64-linux-gnu.so
#7  0x00007ffff7979da9 in PyCFunction_Call (func=0x7fffe37932d0, args=0x7fffe378ba68, kwds=<optimized out>) at Objects/methodobject.c:98
#8  0x00007fffe803e848 in ?? () from $HOME/.local/lib/python3.5/site-packages/netCDF4/_netCDF4.cpython-35m-x86_64-linux-gnu.so
#9  0x00007ffff7a0d585 in PyEval_EvalFrameEx (f=f@entry=0xea15f8, throwflag=throwflag@entry=0) at Python/ceval.c:1594
#10 0x00007ffff7a12c5e in fast_function (nk=<optimized out>, na=<optimized out>, n=2, pp_stack=0x7fffffff94b0, func=<optimized out>) at Python/ceval.c:4803
#11 call_function (oparg=<optimized out>, pp_stack=0x7fffffff94b0) at Python/ceval.c:4730
#12 PyEval_EvalFrameEx (f=f@entry=0x166be98, throwflag=throwflag@entry=0) at Python/ceval.c:3236
#13 0x00007ffff7a15086 in _PyEval_EvalCodeWithName (_co=0x7ffff041e540, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=15, kws=0xbab808, kwcount=0, 
    defs=0x7ffff01fc960, defcount=2, kwdefs=kwdefs@entry=0x0, closure=0x0, name=name@entry=0x7ffff04177f0, qualname=0x7ffff04177f0) at Python/ceval.c:4018
#14 0x00007ffff7a12133 in fast_function (nk=<optimized out>, na=<optimized out>, n=15, pp_stack=0x7fffffff96e0, func=<optimized out>) at Python/ceval.c:4813
#15 call_function (oparg=<optimized out>, pp_stack=0x7fffffff96e0) at Python/ceval.c:4730
#16 PyEval_EvalFrameEx (f=f@entry=0xbab588, throwflag=throwflag@entry=0) at Python/ceval.c:3236
#17 0x00007ffff7a12c5e in fast_function (nk=<optimized out>, na=<optimized out>, n=3, pp_stack=0x7fffffff9860, func=<optimized out>) at Python/ceval.c:4803
#18 call_function (oparg=<optimized out>, pp_stack=0x7fffffff9860) at Python/ceval.c:4730
#19 PyEval_EvalFrameEx (f=f@entry=0x72a7d8, throwflag=throwflag@entry=0) at Python/ceval.c:3236
#20 0x00007ffff7a15086 in _PyEval_EvalCodeWithName (_co=_co@entry=0x7ffff041e930, globals=globals@entry=0x7ffff7f4d508, locals=locals@entry=0x7ffff7f4d508, args=args@entry=0x0, 
    argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=kwdefs@entry=0x0, closure=closure@entry=0x0, 
    name=name@entry=0x0, qualname=qualname@entry=0x0) at Python/ceval.c:4018
#21 0x00007ffff7a15178 in PyEval_EvalCodeEx (_co=_co@entry=0x7ffff041e930, globals=globals@entry=0x7ffff7f4d508, locals=locals@entry=0x7ffff7f4d508, args=args@entry=0x0, 
    argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=kwdefs@entry=0x0, closure=closure@entry=0x0)
    at Python/ceval.c:4039
#22 0x00007ffff7a151bb in PyEval_EvalCode (co=co@entry=0x7ffff041e930, globals=globals@entry=0x7ffff7f4d508, locals=locals@entry=0x7ffff7f4d508) at Python/ceval.c:777
#23 0x00007ffff7a45262 in run_mod (arena=0x6fcc00, flags=0x7fffffff9b40, locals=0x7ffff7f4d508, globals=0x7ffff7f4d508, filename=0x7ffff0408540, mod=0x752cf0) at Python/pythonrun.c:976
#24 PyRun_FileExFlags (fp=fp@entry=0x69b360, filename_str=filename_str@entry=0x7ffff7f18050 "bin/plot_1d_slice.py", start=start@entry=257, globals=globals@entry=0x7ffff7f4d508, 
    locals=locals@entry=0x7ffff7f4d508, closeit=closeit@entry=1, flags=flags@entry=0x7fffffff9b40) at Python/pythonrun.c:929
#25 0x00007ffff7a453c7 in PyRun_SimpleFileExFlags (fp=fp@entry=0x69b360, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffffff9b40) at Python/pythonrun.c:396
#26 0x00007ffff7a45863 in PyRun_AnyFileExFlags (fp=fp@entry=0x69b360, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffffff9b40) at Python/pythonrun.c:80
#27 0x00007ffff7a624c4 in run_file (p_cf=0x7fffffff9b40, filename=0x604320 L"bin/plot_1d_slice.py", fp=0x69b360) at Modules/main.c:318
#28 Py_Main (argc=argc@entry=2, argv=argv@entry=0x603010) at Modules/main.c:768
#29 0x0000000000400ab4 in main (argc=2, argv=<optimized out>) at ./Programs/python.c:65
@jswhit
Copy link
Collaborator

jswhit commented Nov 19, 2018

The netcdf-c library is not yet thread safe, but it is a high priority for the dev team. See #844.

@dschwoerer
Copy link
Author

Sorry for the unclear language.
The two programs are running in different processes, even on physically different machines.
They only share the filesystem. Thus thread-safety cannot be the cause.

It might be related to #813, however.

@jswhit
Copy link
Collaborator

jswhit commented Nov 19, 2018

I don't think you can read from a file that is being written to at the same time.

@dschwoerer
Copy link
Author

I am perfectly happy not being able to read it.
It should however not segfault, but raise a (python) exception.

@jswhit
Copy link
Collaborator

jswhit commented Nov 19, 2018

Python can't catch the error since the segfault is happening in the C library. It looks as if HDF5 1.10 at least make this possible (https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html)

@dschwoerer
Copy link
Author

So should I report this segfault to netcdf?

@jswhit
Copy link
Collaborator

jswhit commented Nov 19, 2018

Wouldn't hurt - but I suspect the issue is that since there is currently no file-locking mechanism in the library there is no way to prevent data (and memory) corruption when you attempt to read and write to the same file at the same time. Segfaults are inevitable as a result, until the SWMR features of HDF5 are enabled in netcdf-c. If you are using NETCDF3, opening the files with NC_SHARE might avoid this (mode='s'). According the docs for nc_open:

"The NC_SHARE flag is only used for netCDF classic and 64-bit offset files. It is appropriate when one process may be writing the dataset and one or more other processes reading the dataset concurrently; it means that dataset accesses are not buffered and caching is limited. Since the buffering scheme is optimized for sequential access, programs that do not access data sequentially may see some performance improvement by setting the NC_SHARE flag."

@jswhit
Copy link
Collaborator

jswhit commented Apr 2, 2019

closing now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants