Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nbclient waits indefinitely on killed subprocesses #88

Closed
JoostvDoorn opened this issue Jun 30, 2020 · 4 comments · Fixed by #90
Closed

Nbclient waits indefinitely on killed subprocesses #88

JoostvDoorn opened this issue Jun 30, 2020 · 4 comments · Fixed by #90
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@JoostvDoorn
Copy link

JoostvDoorn commented Jun 30, 2020

We use nbclient with papermill to run notebooks in docker containers. If a notebook exceed the allocated memory (of the container) only the notebook subprocess seems to be killed (or dies), this seems to be not detected by nbclient, resulting in an infinite running process.

import nbformat
from nbclient import NotebookClient

print("Start")
with open("test_notebook.ipynb") as f:
    nb = nbformat.read(f, as_version=4)

client = NotebookClient(nb, kernel_name="python3")
client.execute()

print("Done")

This issue can be reproduced in two ways, with the script above:

  • Running a python debuger, setting a breakpoint here https://github.com/jupyter/nbclient/blob/master/nbclient/client.py#L521 and killing the subprocess by hand using the system monitor, and then continuing execution. Note: This can create the same type of issue (stuck in an infinite loop), but the solution might not completely overlap with the issue we are experiencing.
  • Or using the following steps with docker. :

First ensure that "cgroup swap limit capabilities" are enabled (if you have a lot of swap space). I used Ubuntu. See here for more info:
https://docs.docker.com/engine/install/linux-postinstall/#your-kernel-does-not-support-cgroup-swap-limit-capabilities

Create the following files

Dockerfile:

FROM python:3
WORKDIR /usr/src/app
RUN pip install numpy ipykernel nbclient
COPY . .
CMD [ "python", "./test.py" ]

test_notebook.ipynb:

import numpy as np

test = np.random.random((110, 10000000))

Build and run docker container (this will result in an infinite running process):

docker build -t my-python-app .
docker run -it --memory-swap 1000m --memory 1000m --rm --name my-running-app my-python-app

Run docker container without any issues:

docker run -it --rm --name my-running-app my-python-app

The issue may be caused by a missing timeout on the following line, but adding a timeout does not work for me:

msg = await ensure_async(self.kc.iopub_channel.get_msg(timeout=None))

@MSeal
Copy link
Contributor

MSeal commented Jul 6, 2020

Thanks for raising, sorry for not having a prompt response. I've got this issue queued to look into. It's likely the recent async changes is the root cause, though I too am not sure exactly where that's happening off-hand.

@MSeal MSeal added bug Something isn't working help wanted Extra attention is needed labels Jul 6, 2020
@davidbrochart davidbrochart self-assigned this Jul 9, 2020
@davidbrochart
Copy link
Member

@JoostvDoorn by default there is no timeout for cell execution. It tried your example and adding a timeout fixes the issue, e.g.:

client = NotebookClient(nb, kernel_name="python3", timeout=1)

raises a DeadKernelError("Kernel died") exception.
I'm wondering if we should not check regularly if the kernel is alive during the execution, not just if the execution timed out.

@JoostvDoorn
Copy link
Author

@davidbrochart That's true, but that comes with the limitation that if the timeout time is exceeded but the kernel is still running okay it will give a CellTimeoutError.

@davidbrochart
Copy link
Member

You're right, so we should regularly check for the kernel to be alive, independently of the timeout. I will open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants