-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault bisected to fix for #11715 #11945
Comments
Hi Michael, thanks for the report. It might be the case that someone steps in to help, but it definitely would be good if you can come up with and post a reduced test case. In particular, segfaults when interfacing with external libraries can be hard to debug (I'm working through one right now in Since you bisected, I'm assuming this happens on later versions than you have listed above (such as a commit from the last few days)? |
That's correct, this has only shown up recently. Earlier versions don't produce the segfault, later versions do. I'll give the reduced test case another shot tomorrow (I promise I tried really hard to get one already!). I'll also take another look at my own code. |
@mweastwood I couldn't get the build script to build all the libraries. (I get an error that |
Any idea what this GC verifier error message suggests? JL_GC_ALLOC_PRINT=0:100 JL_GC_ALLOC_POOL=917800 JL_GC_ALLOC_OTHER=0 JULIA_LOAD_PATH=${PWD}/.. ~/projects/julia/gc-debug/julia -f test/runtests.jl
GC error (probable corruption) :
TypeName(name=Type, module=SimpleVector, names=SimpleVector, primary=Any, cache=Int32, linearcache=Bool, uid=140728650090000) |
OK somehow the type is overwritten by a Old value = (void *) 0x7ffdf132c2b3
New value = (void *) 0x7ffdf132c2b1
restore () at gc.c:2126
2126 for(int i = 0; i < bits_save[b].len; i++) {
(gdb)
Continuing.
Hardware watchpoint 1: *(void**)0x7ffdf150c0e8
Old value = (void *) 0x7ffdf132c2b1
New value = (void *) 0x7ffdf132c200
0x00007ffdda13141b in ffc2s () from /usr/lib/libcfitsio.so.2
(gdb) |
Find the issue. The backtrace of the corruption is attached. From the source code, it seems that the I've never used any of the libraries so I don't know how you should change it. Close since it is not a julia issue.
|
@yuyichao just for future reference, as you probably guessed by now, this error message is when someone trashed a tag. Often happens in pools with OOB stores, we may even want to enable this check in release I'm not sure. |
@carnaval Yeah I knew that the error message means the tag is corrupted. I just want to see if you can tell anything from the value of the corrupted tag but it's apparently unrelated. |
unrelated -> unrelated to julia |
The weird thing that this tag was another valid julia object often happens when someone only touches the LSB of an existing thing. Unfortunately it's the only "downside" of having an almost too easy C FFI. We could have a debug mode which allocates every object with 8 or 16 bytes of canary space at the end that we would fill with randomness before every ccall and check that the ccall was honest afterward. |
That's exactly what happend. See the bit pattern changes above.
Sound's good. |
👍 for @carnaval's idea of the debug mode |
@yuyichao That should be built by this script which is run from here. Did the build script run without errors? I'm going to try to replace the external dependency with some dummy functions to see if I can still generate the segfault. |
Oops, didn't hit refresh! Apologies for the noise and thanks for all your efforts! |
@mweastwood And just FYI, the build error is because that stupid healpix build system doesn't want to link to a shared |
@yuyichao The healpix build system is indeed pretty awful. I'm sure you already noticed I needed to patch it in two places just to get things working on Travis.. |
@carnaval sounds a little like a "ccall sanitizer" |
@carnaval I've thought about the The issue I see is that the conversion from julia object to c pointer is done in julia code (except Another way to do this is to just to let codegen throw all pointer arguments to the GC and let the GC figure out if this is a gc managed memory and sanitize it when necessary. This also has the additional advantage that it can be used to see if the object that is holding the memory is properly rooted. I remember you said that some code for the conservative scan is already there (to figure out whether a pointer is pointing inside a pool allocated GC object). Is the code in the master or is it on a branch? I might open a separate issue for this later... |
As of 63e5735, running the test script at HEALPix.jl (currently unregistered) generates a segfault 100% of the time on my machine (but not on Travis, apparently).
A reduced test case is eluding me right now because removing seemingly unrelated pieces of code makes the segfault go away. Apparently one condition for the segfault to appear is that there needs to be enough cruft in the surrounding code. The offending line of code appears to be a call to
pointer_to_array
here.Hopefully this is enough information, but let me know if I can do anything else.
The text was updated successfully, but these errors were encountered: