-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel nanny proposal #14
Conversation
Not sure if I understand the details, but currently the notebook isn't very good at shutting down the R kernel on windows, because the R kernel is not a single process, but more like R.exe -> cmd -> rterm.exe [see https://github.com/jupyter/jupyter_client/issues/104]. I'm not sure if the "nanny" can detect such a thing without a heartbeat? [Such things might happen even for python kernels, if you use a batch file with |
I suspect it won't make much of a difference either way in that situation. Both currently and with the kernel nanny, it will send Besides fiddling with the time we wait for the kernel to shut itself down, I'm not sure what we could do to improve that. |
instructing the nanny to shut down the kernel. | ||
* A new message type on the control channel from the frontend to the nanny, | ||
instructing the nanny to signal/interrupt the kernel. (*TODO: Expose all Unix | ||
signals, or just SIGINT?*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A signal_request
message makes the most sense. I don't think there's a reason to limit to interrupt/term/kill, all of which we probably want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Unix systems that certainly makes sense. For Windows, should we just pick some numbers to refer to the available ways we have of interrupting/stopping the kernel process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only one or two signals work on Windows reliably, but they are still integers, aren't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AIUI Windows doesn't really have signals at all, but Python exposes certain similar operations through the same interface it uses for signals on Windows. The description of os.kill has some useful info:
https://docs.python.org/3/library/os.html#os.kill
We could quite reasonably expose the same set of options with the same meanings as Python does, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the nanny process is going to run in the same machine as the kernel, it makes sense that the nanny process is asked to interrupt the kernel by means of a message similar to shutdown_request
, then the nanny process interrupts the kernel process by sending the appropriate signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's exactly how this will work. We're just trying to work out what form the message will take. If all the world was Unix, we'd almost certainly just call it signal_request
, and pass a signal number or name. But things get a bit more complicated when we consider kernels running on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the windows problems, see here: jupyter/jupyter_client#104
Thanks for writing this up, @takluyver! |
|
||
When a frontend wants to start a kernel, it currently instantiates a `KernelManager` | ||
object which reads the kernelspec to find how to start the kernel, writes a | ||
connection file, and launches the kernel process. With this process, it will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for clarity:
"With this proposed process, the frontend will..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, tweaked
@takluyver Nicely designed and clearly written. I made a drawing by hand of the frontends, nanny, kernel, and channels; let me know if you would like a copy. 😄 |
- There will be a consistent way to start kernels without a frontend | ||
(`jupyter kernel --kernel x`). | ||
- Kernel stdout & stderr can be captured at the OS level, with real-time updates | ||
of output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks all! @willingc, yes, it would be good to see your drawing, to check if the explanation conveyed what I was thinking clearly. |
@takluyver Here's the link to the drawing's folder on Dropbox: https://www.dropbox.com/sh/kzc9bom60c9e57x/AAAWcdlGo8RZB9cklEv7jC2ua?dl=0 |
Thanks, that looks good. |
@takluyver Great. You detailed things out very clearly 🔑 |
advantages over the current situation, including: | ||
|
||
- Kernels will no longer need to implement the 'heartbeat' for frontends to | ||
check that they are still alive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the nanny process check the kernel is alive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. subprocess.Popen.poll()
, but depending on how it's written, there may well be smarter ways. On Unix, the parent process is sent SIGCHLD
when one of its children dies.
I would suggest that this proposal is split into four proposals:
|
This is absolutely not specific to IPython - the proposal is for the nanny process to be used for all kernels. Your 1 & 2 don't really make sense without a nanny process. 3 is doable, but it's a more incidental benefit. I don't see the benefit of splitting this up into smaller pieces: it's one change to the architecture that lets us do a number of useful things, which seems like exactly the right scope for a JEP. This was also discussed at the in-person dev meeting, and while I don't want to suggest that it's closed for discussion, we did spend some time hashing out what we wanted, and I'd really hope that the remaining issues to work out are details, not the fundamental nature of the proposal. |
On 21/04/16 17:54, Thomas Kluyver wrote:
I think the kernel is in a better position to handle 1 and specially 2 than an agnostic nanny process:
|
@takluyver How about for kernels that are remote? I know this is not officially supported by the notebook server, but it is something that we've experimented with and could be a requirement in certain deployments. |
@n-riesco - the idea in the proposal is that rather than having every kernel implement the capturing and signal/interrupt logic, we'd implement it once outside of the kernel and everyone automatically benefits. As for capturing output, that's opt-in for a kernel, so a kernel absolutely can do their own input/output instead of having the nanny handle it. The nanny makes it much easier to have this automatically taken care of. |
Another concern brought up in the meeting was the latency introduced in forwarding messages through the nanny. Can you mention that in the proposal? I thought @minrk said he might run some tests to get some idea about how much the latency on messages would be impacted by this proposal. |
@takluyver I'm sorry for suggesting a proposal split. Here are 2 suggestions to the current proposal:
|
@minrk - how heavy-weight do you see the nanny being? I imagined either a python file, or a lightweight OS-specific C program with zeromq as a dependency. |
@lbustelo The idea is that the nanny and the kernel are always running together on the same system. They may both be remote from the frontend (e.g. the notebook server), and this will work much like it already does - zmq messages sent over the network. One of the key advantages of this is that will allow interrupting remote kernels, which is currently impossible. @n-riesco @jasongrout I definitely see the nanny as being a lightweight thing with few dependencies. In the first instance, it will likely be written in Python, because that's what we can write and debug most effectively, but I may later use it as an excuse to brush up on a language like Rust or Go, which will make it even lighter.
The logging system is what you'll want to rely on to debug problems with the messaging, so I want it to be a) a separate mechanism, and b) as simple as possible, like 'open this file and write to it'. We can still arrange things so that the frontend can get the kernel logs by making the log file a named pipe, and having the frontend read it. I like this Unix-y approach here, because it provides a lot of flexibility while requiring very little complexity in the kernels. |
On 22/04/16 14:32, Thomas Kluyver wrote:
How would that work in the case of remote kernels? |
There are advantages either way; this way allows the nanny to be running as a service, so remote frontends have a standard way to connect, see which kernel specs are available, and start one. Neither of us felt strongly, but I want to pick something so I can go and prototype it. Even with nanny-per-system, I'd still like to provide an entry point to start the nanny and immediately start one kernel, for the cases where you're spinning up a VM or container to run a single kernel. |
Definitely +1! |
When thinking about remote frontends, it starts to feel like this may be something that belongs in kernel gateway. See https://github.com/jupyter/kernel_gateway_demos/tree/master/nb2kg. |
The proposed 'kernel nanny', now that we're leaning towards one nanny managing multiple kernels, is indeed quite like kernel gateway. The key difference is that this would expose a ZeroMQ interface, rather than an HTTP/websockets interface.
|
As a consumer, I find ZMQ and custom encryption to be a cost compared to WebSockets + SSL. |
Right, there are definitely situations where that's easier. But I could imagine other situations where dealing with SSL certificates is more complex than a simple shared secret for the connection. |
@takluyver Checkout the suppor in kernel gateway for personalities. Might be that it can get enhanced to expose ZMQ. |
The kernel gateway is using the notebook server code as a package and so only knows how to communicate with the outside world using HTTP/Websocket based protocols. Adding a zmq personality would be quite an undertaking. Also, I'm not sure what the future holds for KG in general. Notebook has subsumed some of its features (e.g., token auth, kernel activity monitoring) and jupyter_server is on the enhancement proposal table for separating the frontend and backend bits of notebook. |
websockets definitely have some nice characteristics when talking over remote networks (dealing with all of the connections over a single port is a big one). I think supporting additional transports for remote kernels is a bit of a separate discussion, though. Adding the KernelNanny should make such a proposal easier to implement, since we would know that we always have a process running next to the kernel which could serve as that multiplexer. Part of the point of the nanny proposal is that it is transparent to both the kernel and the client. The most basic functionality that I want from KernelNanny is the original idea in IPEP 12:
Since the kernel client API is all zmq, it makes the most sense to me for these to be zmq request/reply messages like all the rest, rather than adding additional HTTP requests to an otherwise zmq API. If we went with HTTP, these would presumably be regular HTTP requests, though, not websockets, so the bar is lower for clients to talk to them. It does seem internally inconsistent to have some zmq requests and some HTTP requests for the same API, though. To me, starting remote kernels, which is solved by KG, is not part of the nanny proposal. But they do creep closer together when we make the nanny a singleton kernel-providing service rather than a simple manager of one kernel, since it has to gain start/list APIs in addition to the more basic stop/restart. |
I was roughly thinking that with local kernels, the server would integrate the nanny functionality (i.e. it would keep handling interrupts as it currently does), and the remote case would be handled by nb2kg (or a ZMQ equivalent if we wrote one). I suspect it would be no harder to write a KernelManager/KernelClient pair wrapping the HTTP/websockets interface than to write a ZMQ kernel gateway. Maybe this points back to the nanny being per kernel, if we've already got KG as the multiple kernel manager. I think I need to think more carefully about what situations we actually want remote kernels in, and how we manage them (e.g. putting kernels in docker lends itself to one kernel per container; does the nanny process need to be inside the container with the kernel? Or can it do what it needs from outside?). |
I think that makes sense. The QtConsole and jupyter_console and all other entrypoints would also need to run the nanny or talk to an existing one.
I think it can go either way. The simplest version is the nanny in the container with the kernel, where there's really no awareness that docker is involved. The more sophisticated version is a "DockerKernelNanny", akin to the DockerSpawner in JupyterHub, where 'raw' kernels are in containers and the nanny does all of its activities via the docker API. |
Hi all -- sorry to comment on an old thread, but I'm hoping to get clarification about something...what is the current status of the deprecation of kernel heartbeats? There are some mentions of deprecating the heartbeat in this proposal, but the latest version of the messaging spec that I can find still contains them. This is troubling me because I just noticed a kernel (gophernotes) which seems to have dropped heartbeat support in July 2017. I somehow missed the memo about this whole thing, so could someone let me know if heartbeats are dead and the nanny process is how things work now? Is there another source of documentation/truth besides http://jupyter-client.readthedocs.io ? Thanks! |
No problem! The Jupyter notebook doesn't rely on heartbeats, and since that's the interface most people use, some kernels only worry about supporting it. We could make all interfaces work without the heartbeat so long as the kernel runs on the same machine (which it normally does). I don't remember which ones already work like this. Getting rid of the heartbeat entirely was waiting on this proposal because it would have provided another way to tell if a remote kernel is still alive. |
Got it, thanks! Is there some official way to keep up-to-date on the latest version of the messaging spec? Speaking as someone working on an alternate Jupyter frontend, dropping heartbeats seems like a major change that would merit a new major version number on the "Messaging in Jupyter" page... |
If we ever do officially drop them, that would certainly be a new protocol version. For now, kernels are theoretically expected to implement it, but in typical cases there's no need, so many don't. We should probably clarify that in the messaging doc. |
Cool, thanks for the clarification. Tagging @dwhitena so he sees these comments. |
Thanks @thomasjm. We will work on supporting this in gophernotes. |
Hi @takluyver @jasongrout was this ever implemented? It looks like OS-level output is still suppressed by Jupyter, as evidenced by reticulate not displaying Python stdout in IRkernel. If not, do you have any suggestions for manually rerouting OS-level output to Jupyter frontend? |
There is an adjacent pre-proposal which addresses some of the same issues (but not IRkernel logging issue) over at #117. |
Hi all 👋 —Zach here from the @jupyter/software-steering-council. We're working through old JEPs and closing proposals that are no longer active or may not be relevant anymore. Under Jupyter's new governance model, we have an active Software Steering Council who reviews JEPs weekly. We are catching up on the backlog now. Since there has been no active discussion on this JEP in awhile, I'd propose we close it here (we'll leave it open for two more weeks in case you'd like to revive the conversation). If you would like to re-open the discussion after we close it, you are welcome to do that too. |
As discussed at the dev meeting. There are a few TODOs which have not yet been decided. We can bikeshed about them, or whoever gets to implementing the relevant bits first can try their preferred option ;-).
Pinging @JanSchulz, who was interested in this for IRkernel logging.