Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pex compatibility for ipykernel launch #2636

Open
kwlzn opened this issue Jul 6, 2017 · 6 comments
Open

pex compatibility for ipykernel launch #2636

kwlzn opened this issue Jul 6, 2017 · 6 comments

Comments

@kwlzn
Copy link
Contributor

kwlzn commented Jul 6, 2017

at Twitter, our data science and machine learning teams are attempting to package up Jupyter notebook as a self-contained pex for easier distribution and compatibility with our internal build and execution environments.

presently, attempting to create a new notebook while running jupyter notebook from a pex results in a failure to launch the kernel:

[omerta show]$ wget -q https://github.com/pantsbuild/pex/releases/download/v1.2.7/pex27
[omerta show]$ chmod 700 pex27 && ./pex27 --version
pex27 1.2.7
[omerta show]$ pex "ipython<6.0" jupyter -e notebook.notebookapp:main -o ./jupyter_notebook.pex
[omerta show]$ ./jupyter_notebook.pex 
[I 16:51:46.583 NotebookApp] Serving notebooks from local directory: /private/tmp/show
[I 16:51:46.583 NotebookApp] 0 active kernels 
[I 16:51:46.583 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=11cd08f0df7e40e7bbe0cbf4b9fcfef57ef975f375ea8142
[I 16:51:46.583 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:51:46.584 NotebookApp] 
[I 16:51:49.448 NotebookApp] 302 GET / (::1) 0.48ms
[I 16:51:53.755 NotebookApp] Creating new notebook in 
[I 16:51:54.315 NotebookApp] Kernel started: ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[W 16:51:54.332 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20170705165146 (::1) 12.14ms referer=http://localhost:8888/notebooks/Untitled.ipynb?kernel_name=python2
[I 16:51:57.316 NotebookApp] KernelRestarter: restarting kernel (1/5)
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[I 16:52:00.323 NotebookApp] KernelRestarter: restarting kernel (2/5)
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[I 16:52:03.329 NotebookApp] KernelRestarter: restarting kernel (3/5)
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[W 16:52:04.342 NotebookApp] Timeout waiting for kernel_info reply from ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb
[I 16:52:06.339 NotebookApp] KernelRestarter: restarting kernel (4/5)
WARNING:root:kernel ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb restarted
[E 16:52:06.339 NotebookApp] KernelRestarter: restart callback <bound method ZMQChannelsHandler.on_kernel_restarted of ZMQChannelsHandler(ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb)> failed
    Traceback (most recent call last):
      File "/Users/kwilson/.pex/install/jupyter_client-5.1.0-py2.py3-none-any.whl.f35d5547733e40a744cea53c79345f75f659643d/jupyter_client-5.1.0-py2.py3-none-any.whl/jupyter_client/restarter.py", line 81, in _fire_callbacks
        callback()
      File "/Users/kwilson/.pex/install/notebook-5.0.0-py2.py3-none-any.whl.6e81571f8e672c859f4e9d322ebd477865a3f9b3/notebook-5.0.0-py2.py3-none-any.whl/notebook/services/kernels/handlers.py", line 435, in on_kernel_restarted
        self._send_status_message('restarting')
      File "/Users/kwilson/.pex/install/notebook-5.0.0-py2.py3-none-any.whl.6e81571f8e672c859f4e9d322ebd477865a3f9b3/notebook-5.0.0-py2.py3-none-any.whl/notebook/services/kernels/handlers.py", line 431, in _send_status_message
        self.write_message(json.dumps(msg, default=date_default))
      File "/Users/kwilson/.pex/install/tornado-4.5.1-cp27-cp27m-macosx_10_4_x86_64.whl.5c5ad8a4cbaf171bde97e76048ae70bd52a42971/tornado-4.5.1-cp27-cp27m-macosx_10_4_x86_64.whl/tornado/websocket.py", line 249, in write_message
        raise WebSocketClosedError()
    WebSocketClosedError
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[W 16:52:09.347 NotebookApp] KernelRestarter: restart failed
[W 16:52:09.347 NotebookApp] Kernel ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb died, removing from map.
ERROR:root:kernel ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb restarted failed!
[E 16:52:09.348 NotebookApp] KernelRestarter: dead callback <bound method ZMQChannelsHandler.on_restart_failed of ZMQChannelsHandler(ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb)> failed
    Traceback (most recent call last):
      File "/Users/kwilson/.pex/install/jupyter_client-5.1.0-py2.py3-none-any.whl.f35d5547733e40a744cea53c79345f75f659643d/jupyter_client-5.1.0-py2.py3-none-any.whl/jupyter_client/restarter.py", line 81, in _fire_callbacks
        callback()
      File "/Users/kwilson/.pex/install/notebook-5.0.0-py2.py3-none-any.whl.6e81571f8e672c859f4e9d322ebd477865a3f9b3/notebook-5.0.0-py2.py3-none-any.whl/notebook/services/kernels/handlers.py", line 439, in on_restart_failed
        self._send_status_message('dead')
      File "/Users/kwilson/.pex/install/notebook-5.0.0-py2.py3-none-any.whl.6e81571f8e672c859f4e9d322ebd477865a3f9b3/notebook-5.0.0-py2.py3-none-any.whl/notebook/services/kernels/handlers.py", line 431, in _send_status_message
        self.write_message(json.dumps(msg, default=date_default))
      File "/Users/kwilson/.pex/install/tornado-4.5.1-cp27-cp27m-macosx_10_4_x86_64.whl.5c5ad8a4cbaf171bde97e76048ae70bd52a42971/tornado-4.5.1-cp27-cp27m-macosx_10_4_x86_64.whl/tornado/websocket.py", line 249, in write_message
        raise WebSocketClosedError()
    WebSocketClosedError
^C[I 16:52:10.041 NotebookApp] interrupted
Serving notebooks from local directory: /private/tmp/show
0 active kernels 
The Jupyter Notebook is running at: http://localhost:8888/?token=11cd08f0df7e40e7bbe0cbf4b9fcfef57ef975f375ea8142
Shutdown this notebook server (y/[n])? y
[C 16:52:11.840 NotebookApp] Shutdown confirmed
[I 16:52:11.841 NotebookApp] Shutting down kernels

the key output here being:

/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher

which seems to indicate that jupyter is attempting to relaunch the equivalent of python -m ipykernel_launcher .... this is confirmed by looking at the kernel.json for Python 2:

{
 "display_name": "Python 2",
 "language": "python",
 "argv": [
  "python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ]
}

in the pex context, all transitive dependencies needed for execution are self contained within the pex as opposed to sourced from a traditional python environment (e.g. the interpreters site-packages or an outer venv). you can think of it kind of like a zipped executable virtualenv without any externalized environmental setup. so in the case of the attempted launch mode, the pex context will be lost leading to a failure to locate the ipykernel_launcher module in the base vanilla python interpreter's stdlib.

so in order to properly launch an ipykernel from within a pex, we'd need to be self referential and set environment variables. from the CLI, that would look like something along the lines of:

$ PEX_MODULE=ipykernel_launcher <sys.argv[0]> ...

FWICT, it seems like at least one way to accomplish this would be to overload/hijack the main notebook server entrypoint and spit out a custom "kernel spec" prior to server launch that would essentially look like:

{
  "display_name": "Python 2/<path_to_the.pex>",
  "env": {"PEX_MODULE": "ipykernel_launcher"},
  "language": "python",
  "argv": ["python2", "<path_to_the.pex>", "-f", "{connection_file}"]
}

however, it'd be great to avoid hacks like this in favor of a more first class support model for pex.

if anyone has any better solutions or a high level strategy on how to go about adding better first class support for pex in Jupyter, I'd be all ears - and more than willing to contribute the necessary PRs to realize that. thanks in advance!

@kwlzn
Copy link
Contributor Author

kwlzn commented Jul 7, 2017

at least one semi-reasonable strategy here that I can see would be to compose a shim/surrogate entrypoint that wraps the notebook launcher in the pex context that would:

  1. create a temporary dir
  2. emit the kernel.json as described above to the tmp dir under kernels/<id>/kernel.json
  3. add the temporary dir to an exported JUPYTER_PATH
  4. invoke the notebook server runner
  5. cleanup the temporary dir in a finally block

this helps isolate the configuration to a per-run instance vs stashing keyed, per-run copies in e.g. ~/.jupyter.

I'm planning to run with this model now for the purposes of experimentation, but open to better strategies here if anyone has ideas.

@rolweber
Copy link
Contributor

If I interpret this line in launcher.py (jupyter_client) correctly, the kernel will inherit the notebook server's environment, unless the kernel spec defines an environment. So, if your kernel specs don't set any environment variables, you could provide what you need to the notebook server, and it will be available to the kernels.

If your kernel specs do set some environment variables, you could customize the launcher to pass selected environment variables from the notebook server to the kernels. Or you could customize the kernel manager to always pass an environment definition to the launcher. If you get the list of environment variables to be propagated from the configuration, you could create a PR and maybe get your changes merged.

@kwlzn
Copy link
Contributor Author

kwlzn commented Jul 17, 2017

the env var that needs to be set would specify the entrypoint of the kernel launcher, so in terms of concerns it'd be part of the "kernel configuration" (i.e. something we set only at kernel launch time vs something we'd want as a static env var in the parent, which in theory could potentially leak into other non-desired contexts or kernel launches). tho it seems already possible to embed a static env var like this directly into a kernel.json - so really the remaining gap is the self-reference bit (i.e. understanding and being able to parameterize the values of sys.executable and sys.argv[0] from the running notebook server context).

so afaict, to make this all first class it seems like jupyter would need a way to specify kernel configuration in a plugin type model (i.e. executable python code vs json). it might also be cool to use a registry/discovery type pattern against the installed plugins so that just e.g. their presence in the python environment could enable them for use. this would make it as easy as a pip install to add new kernel types.

fwiw, I've posted an initial implementation of the surrogate shim approach described above here which is working well for the moment.

@minrk
Copy link
Member

minrk commented Jul 19, 2017

to make this all first class it seems like jupyter would need a way to specify kernel configuration

The KernelSpecManager and KernelManager classes are the implementations of finding and launching kernels, respectively. These are swappable for alternate implementations via the kernel_manager_class and kernel_spec_manager_class configurables on NotebookApp.

I just put together pexnb which provides a KernelSpecManager that works with PEX and tells the notebook server to use it by default.

You should be able to build a notebook env with pex via:

pex notebook pexnb -m pexnb -o ./jupyter_notebook.pex
$PWD/jupyter_notebook.pex

It has the assumptions:

  1. pex file is launched via absolute path (otherwise the subprocess cannot find the executable, since pex appears to throw away this information by changing to a temporary directory)
  2. you don't want other kernels available (misses some of the point of Jupyter, but if you are going for a single isolated env, this seems to be the right thing to do).

@takluyver
Copy link
Member

Quick reminder: I'm planning a revamp of the kernel finding machinery, described here: jupyter/jupyter_client#261

@kwlzn
Copy link
Contributor Author

kwlzn commented Jul 21, 2017

thanks for the pointers and reference implementation @minrk - very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants