Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPC-GAP can crash upon deep recursion of GAP code #1838

Open
fingolfin opened this issue Nov 1, 2017 · 6 comments
Open

HPC-GAP can crash upon deep recursion of GAP code #1838

fingolfin opened this issue Nov 1, 2017 · 6 comments
Assignees
Labels
kind: bug: crash Issues describing bugs that cause GAP to crash, and PRs fixing them (used for release notes) kind: bug Issues describing general bugs, and PRs fixing them topic: HPC-GAP Issues and PRs related to HPC-GAP

Comments

@fingolfin
Copy link
Member

Normally, we employ a "recursion depth trap" to catch GAP code which accidentally performs an infinite recursion. I.e. if the recursion depth exceed a certain threshold, we show an error and let the user choose between aborting or resuming the computation.

In HPC-GAP this can fail, and we instead run into a segfault. Indeed:

gap> f:=0;; f:=x->f(x);; f(1);
Process 93553 stopped
* thread #1: tid = 0x9938c5, 0x0000000100251cdc gap`AllocateBagMemory(gc_type=-1, type=18, size=40) + 236 at boehm_gc.c:230, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1d)
    frame #0: 0x0000000100251cdc gap`AllocateBagMemory(gc_type=-1, type=18, size=40) + 236 at boehm_gc.c:230
   227        if (!STATE(FreeList)[gc_type+1])
   228          STATE(FreeList)[gc_type+1] =
   229            GC_malloc(sizeof(void *) * TLAllocatorMaxSeg);
-> 230        if (!(result = STATE(FreeList)[gc_type+1][alloc_seg])) {
   231          if (gc_type < 0)
   232            STATE(FreeList)[0][alloc_seg] = GC_malloc_many(alloc_size);
   233          else
(lldb) bt
* thread #1: tid = 0x9938c5, 0x0000000100251cdc gap`AllocateBagMemory(gc_type=-1, type=18, size=40) + 236 at boehm_gc.c:230, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1d)
  * frame #0: 0x0000000100251cdc gap`AllocateBagMemory(gc_type=-1, type=18, size=40) + 236 at boehm_gc.c:230
    frame #1: 0x0000000100251fe6 gap`NewBag(type=18, size=32) + 118 at boehm_gc.c:467
    frame #2: 0x000000010020ec8b gap`NewLVarsBag(slots=1) + 155 at vars.c:128
    frame #3: 0x00000001000ce29a gap`SwitchToNewLvars(func=0x000000010896e8d0, narg=1, nloc=0) + 74 at vars.h:212
    frame #4: 0x00000001000ce55d gap`DoExecFunc1args(func=0x000000010896e8d0, arg1=0x0000000000000005) + 93 at funcs.c:900
    frame #5: 0x00000001000c974e gap`EvalFunccall1args(call=120) + 1294 at funcs.c:516
    frame #6: 0x00000001001dae0b gap`ExecReturnObj(stat=144) + 507 at stats.c:1633
    frame #7: 0x00000001001d34d3 gap`EXEC_STAT(stat=144) + 99 at stats.h:54
    frame #8: 0x00000001001d342a gap`ExecSeqStat(stat=32) + 138 at stats.c:182
    frame #9: 0x00000001000c35a3 gap`EXEC_STAT(stat=32) + 99 at stats.h:54
    frame #10: 0x00000001000ce341 gap`ExecFuncHelper + 49 at funcs.c:855
    frame #11: 0x00000001000ce587 gap`DoExecFunc1args(func=0x000000010896e8d0, arg1=0x0000000000000005) + 135 at funcs.c:906
    frame #12: 0x00000001000c974e gap`EvalFunccall1args(call=120) + 1294 at funcs.c:516
    frame #13: 0x00000001001dae0b gap`ExecReturnObj(stat=144) + 507 at stats.c:1633
...

This seems to be a problem with how we use Boehm GC to maintain a pool / FreeList of small sized bags.

@fingolfin fingolfin added kind: bug Issues describing general bugs, and PRs fixing them kind: bug: crash Issues describing bugs that cause GAP to crash, and PRs fixing them (used for release notes) topic: HPC-GAP Issues and PRs related to HPC-GAP labels Nov 1, 2017
@rbehrends
Copy link
Contributor

I suspect that the actual underlying reason is that in HPC-GAP the stack size tends to be shorter and so we get an actual CPU stack overrun hitting the guard pages. If it's that, this should be relatively easy to fix by adjusting either the stack size or the maximum recursion depth.

@fingolfin
Copy link
Member Author

This still crashes, in more or less the same way:

gap> f:=0;; f:=x->f(x);; f(1);
Process 64450 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001001a0e19 gap`AllocateBagMemory(gc_type=-1, type=18, size=40) at boehm_gc.c:212:38
   209                                         &freeListForType[alloc_seg]);
   210              result = freeListForType[alloc_seg];
   211          }
-> 212          freeListForType[alloc_seg] = *(void **)result;
   213          memset(result, 0, alloc_size);
   214      }
   215      else {
Target 0: (gap) stopped.
(lldb) p alloc_seg
(UInt) $0 = 3
(lldb) p result
(void *) $1 = 0xfdf925dac67b009b
(lldb) p *result
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001001a0e19 gap`AllocateBagMemory(gc_type=-1, type=18, size=40) at boehm_gc.c:212:38
    frame #1: 0x00000001001a0f90 gap`NewBag(type=18, size=32) at boehm_gc.c:418:9
    frame #2: 0x00000001001754d4 gap`NewLVarsBag(slots=1) at vars.c:129:12
    frame #3: 0x000000010008f064 gap`SWITCH_TO_NEW_LVARS(func=0x0000000115d0b480, narg=1, nloc=0) at vars.h:225:31
    frame #4: 0x000000010008dba6 gap`DoExecFunc1args [inlined] DoExecFunc(func=0x0000000115d0b480, narg=1, arg=0x00007ffeefa08370) at funcs.c:477:16
    frame #5: 0x000000010008db31 gap`DoExecFunc1args(func=0x0000000115d0b480, a1=0x0000000000000005) at funcs.c:505
    frame #6: 0x0000000100093953 gap`CALL_1ARGS(f=0x0000000115d0b480, a1=0x0000000000000005) at calls.h:311:12
    frame #7: 0x0000000100091df1 gap`EvalFunccall1args [inlined] EvalOrExecCall(ignoreResult=0, nr=1, call=128) at funcs.c:166:22
    frame #8: 0x0000000100091c0d gap`EvalFunccall1args(call=128) at funcs.c:319
    frame #17335: 0x00000001001abc96 gap`RunThreadedMain(mainFunction=(gap`realmain at gap.c:393), argc=2, argv=0x00007ffeefbfea80) at thread.c:335:5
    frame #17336: 0x000000010009474e gap`main(argc=2, argv=0x00007ffeefbfea80) at gap.c:431:3
    frame #17337: 0x00007fff7ce823d5 libdyld.dylib`start + 1
(lldb) p FuncsState()->RecursionDepth
(Int) $2 = 1441

So, we have 17337 stack frames; but GAP's recursion depth is "just" at 1441; division yields that for each recursion on the GAP level, there are 12 on the C level. That was in a debug build on macOS; in a non-debug build on the same system, I got 25972 resp. 2357 frames for a factor of 11

Limiting the GAP recursion level helps:

gap> SetRecursionTrapInterval(1000);
gap> f:=0;; f:=x->f(x);; f(1);
Error, recursion depth trap (1000) in
  f( x ) at stream:2 called from
f( x ) at stream:2 called from
f( x ) at stream:2 called from
f( x ) at stream:2 called from
f( x ) at stream:2 called from
f( x ) at stream:2 called from
...  at stream:2
you may 'return;'

Conversely, if I work in regular GAP and ignore the recursion depth warnings, I also get a segfault eventually:

gap> f:=0;; f:=x->f(x);; f(1);
Error, recursion depth trap (5000) in
  f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
...  at *stdin*:1
you may 'return;'
brk> return;
Error, recursion depth trap (10000) in
  f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
...  at *stdin*:1
you may 'return;'
brk> return;
Error, recursion depth trap (15000) in
  f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
...  at *stdin*:1
you may 'return;'
brk> return;
Error, recursion depth trap (20000) in
  f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
f( x ) at *stdin*:1 called from
...  at *stdin*:1
you may 'return;'
brk> return;
Segmentation fault: 11

So, several thoughts:

  1. Apparently the HPC-GAP stack is smaller by default. Makes sense if you want multiple threads to run, although perhaps for the "main" thread we could still have a bigger limit matching regular GAP?
  2. Barring that, perhaps we should change the default recursion depth limit in HPC-GAP from 5000 to 1000
  3. I wonder if we can improve our recursion limiting code to detect the actual C stack limits and (if successful) use those to enforce an actual hard limit on the recursion depth?
  4. If we were using a bytecode interpreter, the overhead per recursion would be much smaller (instead of a factor 11-12, I suspect something like 1-2). Ah well.

@rbehrends
Copy link
Contributor

Yes, the stacks are smaller, because e.g. 1000 8MB thread stacks would consume 8GB. Currently, stack size defaults to 1MB on 64-bit architectures and to 256KB on 32-bit architectures (src/hpc/tlsconfig.h). What I'm a bit surprised about now that I think about it is that the main thread also appears to be affected, as we don't actually insert guard pages for that one. I'd have to look into why that happens.

C stack limits could in theory be detected, as we launch the threads ourselves and thus (usually) supply the stacks ourselves already. The exception would be using HPC-GAP as a library, but even then (e.g. Julia) we could probably figure them out. (And HPC-GAP as a library is still something that I haven't gotten around to finishing yet; initializing TLS is the biggest problem if you don't control thread creation yourself.)

The easiest short-term solution would probably involve reducing the recursion stack limit.

@ChrisJefferson
Copy link
Contributor

Do we expect 1,000 threads? Also, does 8GB of stacks matter, when unused memory won't be explicitly allocated until touched?

@rbehrends
Copy link
Contributor

I just checked and see why the main thread is affected, too. As we need to consistently have to align the stack on a boundary that is a multiple of some 2^k, this goes for the main stack, too. This actually means that we also risk that this won't work for a too large stack. I can set it to 4MB at most on macOS without a crash (barring using native TLS). This is because the default for the main thread there is an 8MB stack (or whatever getrlimit(RLIMIT_STACK, ...) gives us), and that 8MB stack may not be properly aligned. I need to check and see if you can actually call setrlimit(RLIMIT_STACK, ...) and have it work for the current process.

@rbehrends
Copy link
Contributor

I have pushed a new change to PR #2845 that bumps the default thread stack size to 8MB and resolves this issue. The PR also needs a rebase, as opers.c got renamed to opers.cc; I'll fix that later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug: crash Issues describing bugs that cause GAP to crash, and PRs fixing them (used for release notes) kind: bug Issues describing general bugs, and PRs fixing them topic: HPC-GAP Issues and PRs related to HPC-GAP
Projects
None yet
Development

No branches or pull requests

3 participants