You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use nbclient with papermill to run notebooks in docker containers. If a notebook exceed the allocated memory (of the container) only the notebook subprocess seems to be killed (or dies), this seems to be not detected by nbclient, resulting in an infinite running process.
This issue can be reproduced in two ways, with the script above:
Running a python debuger, setting a breakpoint here https://github.com/jupyter/nbclient/blob/master/nbclient/client.py#L521 and killing the subprocess by hand using the system monitor, and then continuing execution. Note: This can create the same type of issue (stuck in an infinite loop), but the solution might not completely overlap with the issue we are experiencing.
Thanks for raising, sorry for not having a prompt response. I've got this issue queued to look into. It's likely the recent async changes is the root cause, though I too am not sure exactly where that's happening off-hand.
raises a DeadKernelError("Kernel died") exception.
I'm wondering if we should not check regularly if the kernel is alive during the execution, not just if the execution timed out.
@davidbrochart That's true, but that comes with the limitation that if the timeout time is exceeded but the kernel is still running okay it will give a CellTimeoutError.
We use nbclient with papermill to run notebooks in docker containers. If a notebook exceed the allocated memory (of the container) only the notebook subprocess seems to be killed (or dies), this seems to be not detected by nbclient, resulting in an infinite running process.
This issue can be reproduced in two ways, with the script above:
First ensure that "cgroup swap limit capabilities" are enabled (if you have a lot of swap space). I used Ubuntu. See here for more info:
https://docs.docker.com/engine/install/linux-postinstall/#your-kernel-does-not-support-cgroup-swap-limit-capabilities
Create the following files
Dockerfile:
test_notebook.ipynb:
Build and run docker container (this will result in an infinite running process):
Run docker container without any issues:
The issue may be caused by a missing timeout on the following line, but adding a timeout does not work for me:
nbclient/nbclient/client.py
Line 627 in 4314a44
The text was updated successfully, but these errors were encountered: