Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel handshaking pattern proposal #66

Merged
merged 14 commits into from
Jun 26, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions jupyter-handshaking/jupyter-handshaking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: Kernel Handshaking pattern
authors: Johan Mabille (@JohanMabille)
issue-number:
pr-number: 66
date-started: 2021-01-05
---

# Kernel Handshaking pattern

## Problem

The current implementation of Jupyter client makes it responsible for finding available ports and pass them to a new starting kernel. The issue is that a new process can start using one of these ports before the kernel has started, resulting in a ZMQError when the kernel starts. This is even more problematic when spawning a lot of kernels in a short laps of time, because the client may find available ports that have already been assigned to another kernel.

A workaround has been implemented for the latter case, but it does not solve the former one.

## Proposed Enhancement

We propose to implement a handshaking pattern: the client lets the kernel find free ports and communicate them back via a dedicated socket. It then connects to the kernel. More formally:

- The kernel launcher is responsible for opening a dedicated socket for receiving connection information from kernels (channel ports). This socket will be referred as the **registration socket**.
Copy link
Member

@echarles echarles Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanMabille Would the port registration socket change on each kernel launch, or is it static and the socket remains the same for each kernel launch?

(in other words, do we have as many registration sockets as kernels, or only one registration socket for all the kernels)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two options are theoretically possible (thus the wording "The kernel should not expect the registration socket to exist after it has received the acknowledge receipt (i.e. it can be closed)" in the JEP), although I guess that in most of the implementations we would keep a single socket as long as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec should be as clear as possible. If both options are possible, it should be stated clearly in words. Further questions:

  1. In your many registration socket option, what is the added value of this JEP? The initial issue was the availability of the 5 ports. If a registration port needs to be found on each kernel launch, we don't win a lot and are still subject to failure.
  2. In your single registration socket option, what process is responsible to maintain that socket? Would it be some code in jupyter-client or will each of the "client" (jupyter-server, jupyter-console...) be responsible for that? (I except you answer to be "hey this is an implementation detail...", I hope it will not be)

Copy link
Member Author

@JohanMabille JohanMabille Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec should be as clear as possible. If both options are possible, it should be stated clearly in words.

This can be done in the PR to the doc of the protocol and in the protocol schema. Letting both options opened in the JEP should be enough.

In your many registration socket option, what is the added value of this JEP? The initial issue was the availability of the 5 ports. If a registration port needs to be found on each kernel launch, we don't win a lot and are still subject to failure.

The current pattern is:
1] the client finds free ports by opening sockets on them
2] the clients closes the sockets
3] The clients communicates these ports to the kenrel
4] the kernel tries to open sockets on these ports.

So the problem is that there is a delay between finding free ports and actually opening sockets for using them. With the handshake pattern, the situation is totally different, even in the "many registration socket option": the kernel finds the free ports by opening sockets on them and then does not close them.

In your single registration socket option, what process is responsible to maintain that socket? Would it be some code in jupyter-client or will each of the "client" (jupyter-server, jupyter-console...) be responsible for that? (I except you answer to be "hey this is an implementation detail...", I hope it will not be)

The process responsible for maintaining that socket is the process that is responsible for launching the kernel, whatever it is, since it is the one passing the registration socket info to the kernel upon launch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to write that registration socket lifetime should not be in the spec, as it's an implementation detail irrelevant to kernels, but then I thought about kernel restarts. Presumably restarting kernels need to re-register their (possibly new) sockets, so the registration socket should stay open (or reopen at the same port - not always available) IF kernel restarts are expected. For example:

The client MAY close the registration socket after completing a kernel's registration.
The client MAY use a single registration socket for multiple kernels.
To restart a kernel will require the registration socket again, so the client SHOULD keep the registration socket open while it expects restarts to be possible, OR open a new socket and pass the new registration socket URL to the new process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry, I missed this one. I will update the JEP accordingly.

- When starting a new kernel, the launcher passes the connection information for this socket to the kernel.
- The kernel starts, finds free ports to bind the shell, control, stdin, heartbeat and iopub sockets. It then connects to the registration socket and sends the connection information to the registration socket.
Copy link
Member

@martinRenou martinRenou Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The kernel starts, finds free ports to bind the shell, control, stdin, heartbeat and iopub sockets. It then connects to the registration socket and sends the connection information to the registration socket.
- The kernel starts, finds free ports to bind the shell, control, stdin, heartbeat and iopub sockets. The connection ports information will be referred as the **connection information**. It then connects to the registration socket and sends the connection information to the kernel launcher through that socket.

Copy link
Member

@martinRenou martinRenou Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we formally define the socket message format for the ports information? Or is it irrelevant to a JEP?

Copy link
Member Author

@JohanMabille JohanMabille Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the length of the discussion here, it has been decided to discuss the format of the messages in the PR updating the protocol (similarly to what was done when adding the debugger, where we described the global approach in the JEP, and discussed the details in the PR to the protocol).

- Upon reception of the connection information, the launcher sends an acknowledge receipt to the kernel, and the client connects to the kernel.

The way the launcher passes the connection information for the registration socket to the kernel should be similar to that of passing the ports of the kernel socket in the current connection pattern: a connection file that can be read by local kernels or sent over the network for remote kernels (although this requires a custom kernel provisioner or "nanny"). This connection file should also contain the signature scheme and the key.

Reagarding the registration socket lifetime:

- The kernel launcher MAY close the registration socket after completing a kernel's registration. Therefore, the kernel should disconnect from the registration socket right after it has received the acknowledge receipt. A kernel should shutdown itself if it does not receive an acknowledge receipt after some time (the value of the time limit is let to the implementation).
- To restart a kernel will require the registration socket again, so the kernel launcher SHOULD keep the registration socket open if it expects restarts to be possible, or open a new socket and pass the new registration socket URL to the new process.

The kernel should write its connection information in a connection file so that other clients can connect to it.

The kernel specifies whether it supports the handshake pattern via the "kernel_protocol_version" field in the kernelspec:
- if the field is missing, or if its value if less than 5.5, the kernel supports passing ports only.
- if the field value is >=5.5, the kernel supports both mechanisms.

### Remarks

This pattern is **NOT** a replacement for the current connection pattern. It is an additional one and kernels will have to implement both of them to be conformant to the Jupyter Kernel Protocol specification. Which pattern should be used for the connection is decided by the kernel launcher, depending on the information passed in the initial connection file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanMabille Do we need to specify that coexisting launcher and kernels not supporting both patterns should be possible. e.g. old launcher (single-mode) should be able to launch new kernels (dual-mode)

Similarly, how would.a dual-mode launcher know if the kernel is single- or dual-mode? Is there a "capabilities" JEP for that. I guess it is mandatory for the launcher to know if the kernel support the new handshaking pattern.

Copy link
Member Author

@JohanMabille JohanMabille Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in the JEP:

The kernel specifies whether it supports the handshake pattern via the "kernel_protocol_version" field in the kernelspec:

  • if the field is missing, or if its value if less than 5.5, the kernel supports passing ports only.
  • if the field value is >=5.5, the kernel supports both mechanisms.

So old launchers can start new kernels, and new launchers will still be able to start old kernels. Reading the kernelspec is enough to know whether a kernel supports both mechanisms or the older one only.


A recommended implementation for a multi-kernel client (i.e. jupyter-server) is to have a single long-lived registration socket.

### Impact on existing implementations

Although this enhancement requires changing all the existing kernels, the impact should be limited. Indeed, most of the kernels are based on the kernel wrapper approach, or on xeus.

Most of the clients are based on `jupyter_client`. Therefore, the changes should only be limited to this repository or external kernel provisioners.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Most of the clients are based on `jupyter_client`. Therefore, the changes should only be limited to this repository or external kernel provisioners.
Most of the clients are based on `jupyter_client`. Therefore, the changes should only be limited to this repository or external kernel provisioners and additional client libraries, e.g. in other languages.


## Relevant Resources (GitHub repositories, Issues, PRs)

### GitHub repositories

- Jupyter Client: https://github.com/jupyter/jupyter_client
The Jupyter protocol client APIs
- Voilà: https://github.com/voila-dashboards/voila
Voilà turns Jupyter notebooks into standalone web applications
- IPyKernel: https://github.com/ipython/ipykernel
IPython kernel for Jupyter
- Xeus: https://github.com/jupyter-xeus/xeus
The C++ implementation of the Jupyter kernel protocol

### GitHub Issues

- Spawning many kernels may result in ZMQError (https://github.com/jupyter/jupyter_client/issues/487)
- Spawning ~20 requests at a time results in a ZMQError (https://github.com/voila-dashboards/voila/issues/408#issuecomment-539968325)

### GitHub Pull Requests

- Prevent two kernels to have the same ports (https://github.com/jupyter/jupyter_client/pull/490)