Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel subshells (JEP91) implementation #1249

Merged
merged 37 commits into from
Oct 3, 2024
Merged

Conversation

ianthomas23
Copy link
Collaborator

@ianthomas23 ianthomas23 commented Jun 13, 2024

This is the implementation of the kernel subshells JEP (jupyter/enhancement-proposals#91). It follows the latest commit (1f1ad3d) with the addition of a %subshell magic command that is useful for debugging. To try this out I have a JupyterLab branch that talks to this branch and is most easily tried out using https://mybinder.org/v2/gh/ianthomas23/jupyterlab/jep91_demo?urlpath=lab; once the mybinder instance has started, open the subshell_demo_notebook.ipynb and follow the instructions therein.

The idea is that this is mergeable as it is now, it is backward compatible in that it does not break any existing use of ipykernel (subject to CI confirmation). There are some ramifications of the protocol additions (outlined below) that will need addressing eventually, but I consider these future work that can be in separate PRs.

Outline of changes

  1. The parent subshell (i.e. the main shell) runs in the main thread.
  2. Each new subshell runs in a separate thread.
  3. There is a new thread that deals with all communication on the shell channel, previously this was performed in the main thread.
  4. Communication between the shell channel thread and other threads is performed using ZMQ inproc pair sockets, which are essentially shared memory and avoid the use of thread synchronisation primitives.
  5. Incoming shell messages are handled by the shell channel thread which extracts the subshell_id from the message and passes it on to the correct subshell.
  6. Subshells are created and deleted via messages sent on the control channel. These are passed to the shell channel thread via inproc pair sockets so that the SubshellManager in the shell channel thread is responsible for subshell lifetimes.

Example scenario

Here is an example of the communication between threads when running a long task in the parent subshell (main thread) and whilst this is running a child subshell is created, used, and deleted.

sequenceDiagram
    participant client as Client
    participant control as Control thread
    participant shell as Shell channel thread
    participant main as Main thread

    client->>+shell: Execute request (main shell)
    shell->>-main: Execute request (inproc)
    activate main

    client->>+control: Create subshell request
    control->>-shell: Create subshell request (inproc)
    activate shell
    create participant subshell as Subshell thread
    shell-->>subshell: Create subshell thread

    shell->>control: Create subshell reply (inproc)
    deactivate shell
    activate control
    control->>-client: Create subshell reply

    client->>+shell: Execute request (subshell)
    shell->>-subshell: Execute request (inproc)
    activate subshell

    subshell->>shell: Execute reply (inproc)
    deactivate subshell
    activate shell
    shell->>-client: Execute reply (subshell)

    client->>+control: Delete subshell request
    control->>-shell: Delete subshell request (inproc)
    activate shell
    destroy subshell
    shell-->>subshell: Delete subshell thread

    shell->>control: Delete subshell reply (inproc)
    deactivate shell
    activate control
    control->>-client: Delete subshell reply

    main->>shell: Execute reply (inproc)
    deactivate main
    activate shell
    shell->>-client: Execute reply (main shell)
Loading

Future work

ipykernel

  1. Shell channel thread deserialises the whole some of the message to get the subshell_id. Ideally it would only deserialise the header. May need changes in Jupyter Client.
  2. Signalling a subshell to stop uses a threading.Event following the existing anyio implementation which requires an extra thread per Event. It would be nice if this could be changed so a subshell is a single thread not two.
  3. Execution count. Should either be a separate count per subshell or a single count for a kernel. Needs a decision and changes in IPython as is currently not atomic.
  4. History. Related to item 2 above.
  5. input() on more than one subshell at the same time run but do not store correctly.
  6. Debugger use needs investigating.
  7. Busy/idle status needs investigating. Should there, as now, be separate status for each subshell, or the concept of kernel (i.e. any subshell) busy status? This issue is much wider than subshells as it includes status of the control channel, and how Jupyter Server should track status (Improve the busy/idle execution state tracking for kernels. jupyter-server/jupyter_server#1429).
  8. Use of display hooks for e.g. Matplotlib. Should these be on the parent subshell, or child subshells too?

JupyterLab

The JupyterLab branch I am using to demo this isn't really intended to be merged. But if it was, it needs:

  1. Check kernel_info to see if subshells are supported.
  2. Delete subshell when close a subshell's ConsolePanel.
  3. Report subshell IDs in tree view?
  4. Display of subshell busy/idle status.

(Edited for clarity)

@ianthomas23
Copy link
Collaborator Author

The kernel subshells JEP has been accepted and as this is the reference implementation for that I am asking for someone to review this.

This was referenced Sep 10, 2024
Comment on lines 128 to 134
while True:
for socket, _ in self._poller.poll(0):
msg = await socket.recv_multipart(copy=False)
self._shell_socket.send_multipart(msg)

# Yield to other tasks.
await sleep(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop leads to 100% CPU usage, we need to find another way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. On my macOS dev machine this is fine, but I've confirmed that it is a problem on Linux.

Copy link
Collaborator Author

@ianthomas23 ianthomas23 Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation in 97e3e91. I've removed the sleep(0) and poll(0) calls and now use an async zmq Poller in an anyio task. When a subshell is created or deleted this task is cancelled using an anyio.Event and rescheduled with the updated list of subshells. With this I no longer see 100% CPU load on Linux or macOS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation of this piece of code again. My use of await poller.poll() turned out to cause problems on python < 3.10 and on Windows. Now I am avoiding use of zmq.Poller and instead using a separate anyio task for each subshell to listen for reply messages and send them out to the client via the shell channel. This seems to be more robust on older python and Windows. I've also replaced the use of an anyio.Event with a memory object stream instead (essentially an anyio async queue).

@ianthomas23
Copy link
Collaborator Author

I had missed a direct use of the shell_socket to send an abort_reply message when it should have been sent via the SubshellManager. With that fixed, all the local CI is now passing.

Copy link
Member

@Carreau Carreau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience, I'm going to try reviewing this and pushing it forward.

I'm done a partial read – not in depth, and added a few comments – which are not mandatory request for changes but thoughts to myself, for when I'll come back to it.

I have to run some errands, so I'm going to post this as is for now but will come back to it later.

ipykernel/thread.py Outdated Show resolved Hide resolved
ipykernel/thread.py Outdated Show resolved Hide resolved

def set_task(self, task):
self._task = task
super().__init__(name=CONTROL_THREAD_NAME, **kwargs)

def run(self):
"""Run the thread."""
self.name = CONTROL_THREAD_NAME
Copy link
Member

@Carreau Carreau Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary now Thread have setter/getter and super().__init__ should set the private name directly.

Maybe try to put an assert ==, and if test are passing, just remove the override of the run method ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the name duplication in b644f9e7 and 9109cc8 without causing any problems.

ipykernel/kernelbase.py Show resolved Hide resolved
ipykernel/kernelbase.py Show resolved Hide resolved
ipykernel/subshell.py Outdated Show resolved Hide resolved

self._context: zmq.asyncio.Context = context
self._shell_socket = shell_socket
self._cache: dict[str, Subshell] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self curiosity:

check the difference between attribute typing in __init__ and at class level.

Is mypy smart enough, or does it say attribute may be unset if type set in __init__ ??

ipykernel/subshell_manager.py Outdated Show resolved Hide resolved
ipykernel/subshell_manager.py Outdated Show resolved Hide resolved
elif type == "list":
reply["subshell_id"] = self.list_subshell()
else:
msg = f"Unrecognised message type {type}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = f"Unrecognised message type {type}"
msg = f"Unrecognised message type {type!r}"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d17c77f.

ipykernel/kernelbase.py Outdated Show resolved Hide resolved
ipykernel/kernelbase.py Outdated Show resolved Hide resolved
ipykernel/thread.py Outdated Show resolved Hide resolved
ipykernel/thread.py Outdated Show resolved Hide resolved
ipykernel/thread.py Outdated Show resolved Hide resolved
ipykernel/zmqshell.py Outdated Show resolved Hide resolved
ipykernel/zmqshell.py Show resolved Hide resolved
tests/utils.py Show resolved Hide resolved
@Carreau
Copy link
Member

Carreau commented Oct 2, 2024

I'm +1 for this, a couple of stylistic notes – if you agree I'm happy to push some changes updates but don't want to do so without your approval as I know it can get unwieldy to have another maintainer push changes on your branch on big PRs.

I think we can merge this as "experimental" and then work on the related issues when using this features on various projects (IPython history etc...).

@ianthomas23
Copy link
Collaborator Author

@Carreau Thanks for the review and for the offer of pushing changes. I'd rather make the changes myself, and I'll try to group them sensibly to avoid to much CI churn.

@davidbrochart
Copy link
Collaborator

It seems that you use zmq.PAIR sockets for inter-thread communication (sub-shell thread <---> main-shell thread, sub-shell thread <--> control thread), are those real sockets? If so, sockets are a scarce resource, and AnyIO has nice facilities for inter-thread communication, that could be used instead?
I see that you use AnyIO memory object streams in the sub-shell manager, could you explain your use of them, in particular if they are used for inter-thread communication?

@ianthomas23
Copy link
Collaborator Author

It seems that you use zmq.PAIR sockets for inter-thread communication (sub-shell thread <---> main-shell thread, sub-shell thread <--> control thread), are those real sockets?

A ZMQ inproc pair is an area of memory shared between two threads and wrapped to look like ZMQ sockets.

I see that you use AnyIO memory object streams in the sub-shell manager, could you explain your use of them, in particular if they are used for inter-thread communication?

Here is where they are created in the source code which includes some explanation:

# anyio memory object stream for async queue-like communication between tasks.
# Used by _create_subshell to tell listen_from_subshells to spawn a new task.
self._send_stream, self._receive_stream = create_memory_object_stream[str]()

The communication is between two different async tasks running in the same thread.

Copy link
Member

@Carreau Carreau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes for myself and history, but all good on my side. Let's get that in and iterate if necessary.

# socket=None is valid if kernel subshells are not supported.
try:
while True:
await self.process_shell_message(socket=socket)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR:

  • I don't like that breaking out of this while loop requires an exception that is hard to find in the implementation of process_shell_message; we should likely have an explicit exit point.

Comment on lines 500 to 501
if inspect.isawaitable(result):
await result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR:

  • I think we should move to handler always returning an awaitable, or more likely handler always being a coroutine function.

"protocol_version": kernel_protocol_version,
"implementation": self.implementation,
"implementation_version": self.implementation_version,
"language_info": self.language_info,
"banner": self.banner,
"help_links": self.help_links,
"supported_features": [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return
if not self._supports_kernel_subshells:
self.log.error("Subshells are not supported by this kernel")
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And from the discussion bellow, maybe think in later maybe return something that indicate why the subshell was not created ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe not as that would change the message spec.

I think I get that better now, that these handler return values likely go nowhere so why they do not raise exception. Let's maybe leave making this clearer to a subsequent refactor/maintenance.

except BaseException as err:
reply = {
"status": "error",
"evalue": str(err),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR:

  • Can this contain information that should not be send to the frontend ?
  • I don't believe so as the user is anyway the one that runs the kernel, but we had case where traceback would contain sensitive values. In other apps webapp this would be a likely issue (like revealing paths and filename of a server), but I don't think it's an issue here.

@Carreau
Copy link
Member

Carreau commented Oct 3, 2024

I haven't merged something on this repo in quite some time, I think I would prefer a rebase-and-merge as you took time to craft the commits individually, but I think the policy here is squash-merged.

Any objections to squash-merge ?

@ianthomas23
Copy link
Collaborator Author

Any objections to squash-merge ?

A squash-merge is fine by me.

Copy link
Collaborator

@davidbrochart davidbrochart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ianthomas23 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants