-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Jupyter connection race condition #563
Comments
I think this has also been an area of concern when running tests in parallel. This scheme should be implemented in the new kernel supervisor. We'd need some way for the kernel to advertise that it supports this mechanism (maybe a new field we pass when starting). |
In the linked thread someone suggests adding a field |
See also this JEP that has been "Approved" and is therefore supposedly the recommended way to fix this problem jupyter/enhancement-proposals#66 Nicely viewable at https://jupyter.org/enhancement-proposals/66-jupyter-handshaking/jupyter-handshaking.html |
So it sounds like we'd add 1 new optional field to pub struct ConnectionFile {
/// ZeroMQ port: Handshake channel
pub handshake_port: Option<u16>,
/// ZeroMQ port: Control channel (kernel interrupts)
pub control_port: u16,
/// ZeroMQ port: Shell channel (execution, completion)
pub shell_port: u16,
/// ZeroMQ port: Standard input channel (prompts)
pub stdin_port: u16,
/// ZeroMQ port: IOPub channel (broadcasts input/output)
pub iopub_port: u16,
/// ZeroMQ port: Heartbeat messages (echo)
pub hb_port: u16,
/// The transport type to use for ZeroMQ; generally "tcp"
pub transport: String,
/// The signature scheme to use for messages; generally "hmac-sha256"
pub signature_scheme: String,
/// The IP address to bind to
pub ip: String,
/// The HMAC-256 signing key, or an empty string for an unauthenticated
/// connection
pub key: String,
} The client would fill out the connection file like so (note that signature-scheme and key are needed):
Ark would see that Internally, ark then finds 5 free ports and immediately binds to them (thus avoiding the race condition). Ark would then utilize the
This connection file would be written to disk somewhere. Ark would then send the client a new Jupyter Message type I call pub struct ConnectionRequest {
/// The path to the connection file created from the handshake
pub file: String,
} The client would get this pub struct ConnectionReply {
} I'm not sure if According to the JEP, ark should also write a |
@DavisVaughan is there anything in the JEP that suggests we need to write the connection information to disk and send a file path? Seems like it'd be more straightforward to just send a message that includes the connection data itself (plus would work in cross-machine configs) |
Yes on writing to disk in general
But its not specific on exactly how the kernel sends that info back to the client
So we could totally make |
We're seeing issues in Windows CI that look like:
I can also reproduce it locally by running integration tests in a loop. It's a mystery why it happens so often on the Windows CI though.
This might be due to the "classic jupyter race condition": jupyter/jupyter_client#487. In Jupyter's connection scheme, the client searches for available ports, communicates those to the server which then tries to bind to them. This fails if any of the ports end up getting used up in the meantime.
There is no other solution than to let the kernel pick the ports. In the linked issue, they suggest implementing this scheme:
Essentially the client would pick a port for a handshake socket, bind to it, and send this connection info:
And the server would connect to the handshake socket and send back:
On the server side, it looks like we can use
:*
or:0
to let the OS pick a port: https://stackoverflow.com/questions/16699890/connect-to-first-free-port-with-tcp-using-0mqPositron could also use this to make the initial connection to Ark more robust, cc @jmcphers.
The text was updated successfully, but these errors were encountered: