Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncaught exception on GET /api/kernels AttributeError: 'RemoteKernelManager' object has no attribute 'last_activity' #1395

Open
bloomsa opened this issue Nov 27, 2024 · 0 comments
Labels

Comments

@bloomsa
Copy link
Contributor

bloomsa commented Nov 27, 2024

Description

I have an instance of Enterprise Gateway deployed that launches Yarn-based kernels which generally take between 3 to 5 minutes to fully start up and connect back to a JupyterLab client. Often when kernels are in the middle of starting we see the following uncaught exception in the GET api/kernels endpoint:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/tornado/web.py", line 1713, in _execute
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/enterprise_gateway/services/kernels/handlers.py", line 122, in get
    await super().get()
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/kernels/handlers.py", line 47, in get
    kernels = await ensure_async(km.list_kernels())
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 490, in list_kernels
    model = self.kernel_model(kernel_id)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 476, in kernel_model
    "last_activity": isoformat(kernel.last_activity),
AttributeError: 'RemoteKernelManager' object has no attribute 'last_activity'

I found an issue: jupyter/notebook#5345 that led me to my working understanding of the problem:

  1. kernel_manager instances don't initialize a last_activity field in their constructor, rather it is added later in the kernel start method.
  2. since our kernels start slow, there is a larger period of time where 1. is true, and if any concurrent api/kernels requests come through, EG does not handle these starting kernels gracefully.

Can anyone with deeper understanding of the class hierarchies confirm this understanding?

Naively, I would think last_activity could be set to utcnow() in the constructor to get around this, but perhaps there is a valid reason for not doing so?

Reproduce

  1. Connect JupyterLab (I've tested using 3.6.x) to an instance of Enterprise Gateway
  2. Start a kernel or a few, these should be slow starting kernels, adding a time.sleep() to the launch script would suffice I think
  3. Hammer the GET /api/kernels separately while the kernels are starting

Expected behavior

I would expect EG to either return a list of kernels with starting kernels in some designated starting state, or just keep them out of the list of kernels in that api response.

Context

  • Operating System and version: linux (containerized) for both EG and JupyterLab
  • Browser and version:
  • Jupyter Server version: Lab and EG both using jupyter_server 1.24.0
Troubleshoot Output
Paste the output from running `jupyter troubleshoot` from the command line here.
You may want to sanitize the paths in the output.
``` $PATH: /opt/conda/bin /opt/conda/condabin /opt/conda/bin /usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin

sys.path:
/opt/conda/bin
/opt/conda/lib/python310.zip
/opt/conda/lib/python3.10
/opt/conda/lib/python3.10/lib-dynload
/opt/conda/lib/python3.10/site-packages

sys.executable:
/opt/conda/bin/python

sys.version:
3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0]

platform.platform():
Linux-6.1.82-talos-x86_64-with-glibc2.35

which -a jupyter:
/opt/conda/bin/jupyter
/opt/conda/bin/jupyter

</details>
<details><summary>Command Line Output</summary>
<pre>
Paste the output from your command line running `jupyter lab` here, use `--debug` if possible.
</pre>
</details>

<details><summary>Browser Output</summary>
<!--See https://webmasters.stackexchange.com/a/77337 for how to access the JavaScript console-->
<pre>
Paste the output from your browser Javascript console here, if applicable.

</pre>
</details>
@bloomsa bloomsa added the bug label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant