-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPC-GAP can crash upon deep recursion of GAP code #1838
Comments
I suspect that the actual underlying reason is that in HPC-GAP the stack size tends to be shorter and so we get an actual CPU stack overrun hitting the guard pages. If it's that, this should be relatively easy to fix by adjusting either the stack size or the maximum recursion depth. |
This still crashes, in more or less the same way:
So, we have 17337 stack frames; but GAP's recursion depth is "just" at 1441; division yields that for each recursion on the GAP level, there are 12 on the C level. That was in a debug build on macOS; in a non-debug build on the same system, I got 25972 resp. 2357 frames for a factor of 11 Limiting the GAP recursion level helps:
Conversely, if I work in regular GAP and ignore the recursion depth warnings, I also get a segfault eventually:
So, several thoughts:
|
Yes, the stacks are smaller, because e.g. 1000 8MB thread stacks would consume 8GB. Currently, stack size defaults to 1MB on 64-bit architectures and to 256KB on 32-bit architectures ( C stack limits could in theory be detected, as we launch the threads ourselves and thus (usually) supply the stacks ourselves already. The exception would be using HPC-GAP as a library, but even then (e.g. Julia) we could probably figure them out. (And HPC-GAP as a library is still something that I haven't gotten around to finishing yet; initializing TLS is the biggest problem if you don't control thread creation yourself.) The easiest short-term solution would probably involve reducing the recursion stack limit. |
Do we expect 1,000 threads? Also, does 8GB of stacks matter, when unused memory won't be explicitly allocated until touched? |
I just checked and see why the main thread is affected, too. As we need to consistently have to align the stack on a boundary that is a multiple of some |
I have pushed a new change to PR #2845 that bumps the default thread stack size to 8MB and resolves this issue. The PR also needs a rebase, as |
Normally, we employ a "recursion depth trap" to catch GAP code which accidentally performs an infinite recursion. I.e. if the recursion depth exceed a certain threshold, we show an error and let the user choose between aborting or resuming the computation.
In HPC-GAP this can fail, and we instead run into a segfault. Indeed:
This seems to be a problem with how we use Boehm GC to maintain a pool / FreeList of small sized bags.
The text was updated successfully, but these errors were encountered: