STATUS ☑️ Package building works, and you can launch the server
using systemctl start jupyterhub
, or via the provided Dockerfile.run
.
It'll use PAM authorization, i.e. starts the notebook servers in a local user's context.
The package is tested on Ubuntu Bionic, on Ubuntu Xenial (with Python 3.8.1 from the Deadsnakes PPA),
and on Debian Stretch. You can also use a Docker container
(see Dockerfile.run
/ user: admin
/ pwd: test1234
).
Contents
- What is this?
- Customizing the Package Contents
- “Devops Intelligence” showcase
- How to build and install the package
- Trouble-Shooting
- Updating requirements
- How to set up a simple service instance
- Securing your JupyterHub web service with an SSL off-loader
- Changing the Service Unit Configuration
- Configuration Files
- Data Directories
- References
This project provides packaging of the core JupyterHub components, so they can be easily installed on Debian-like target hosts. This makes life-cycle management on production hosts a lot easier, and avoids common drawbacks of ‘from source’ installs, like needing build tools and direct internet access in production environments.
The Debian packaging metadata in
debian
puts the jupyterhub
Python package and its dependencies as released on PyPI into a DEB package,
using dh-virtualenv.
The resulting omnibus package is thus easily installed to and removed from a machine,
but is not a ‘normal’ Debian python-*
package. If you want that, look elsewhere.
Since the dynamic router of JupyterHub is a Node.js application, the package also has a dependency on nodejs
,
limited to the current LTS version range (that is 12.x or 10.x as of this writing).
In practice, that means you should use the
NodeSource
packages to get Node.js,
because the native Debian ones are typically dated (Stretch comes with 4.8.2~dfsg-1
).
Adapt the debian/control
file if your requirements are different.
To add any plugins or other optional Python dependencies, list them in install_requires
in setup.py
as usual
– but only use versioned dependencies so package builds are reproducible.
These packages are then visible in the default Python3 kernel.
Or add a requirements.txt
file, which has the advantage that you don't need to change any git-controlled files.
Some standard extensions are already contained in setup.py
as setuptools extras.
The viz
extra installs seaborn
and holoviews
,
which in turn pulls large parts of the usual data science stack,
including numpy
, scipy
, pandas
, and matplotlib
.
The related vizjs
extra adds several Javascript-based frameworks like bokeh
,
and image rendering support for SVG/PNG writing.
Activating extras increases the package size by 10s or even 100s of MiB,
so be aware of that and keep an eye on package size.
Activate the spark
extra to get PySpark and related utilities.
The systemd unit already includes support for auto-detection or explicit configuration
of an installed JVM.
To activate extras, you need dh-virtualenv
v1.1 which supports the
--extras option.
That option is used as part of the EXTRA_REQUIREMENTS
variable in debian/rules
– add or remove extras there as you see fit.
There are two special extras named default
and full
– the DEFAULT_EXTRAS
are listed in setup.py
, and full
is simply everything.
Here is an example of what you can do using this package, without any great investment of effort or capital. Within a simple setup adding a single JupyterHub host, you can use the built-in Python3 kernel to access existing internal data sources (see figure below).
Such a setup supports risk analysis and decision making within development and operations processes – typical business intelligence / data science procedures can be applied to the ‘business of making and running software’. The idea is to create feedback loops, and facilitate human decision making by automatically providing reliable input in form of up-to-date facts. After all development is our business — so let's have KPIs for developing, releasing, and operating software.
See this notebook or this blog post for more details and a concrete example of how to use such a setup.
Packages are built in Docker using the Dockerfile.build
file.
That way you do not need to install tooling and build dependencies on your machine,
and the package always gets built in a pristine environment.
The only thing you need on your workstatioon is a docker-ce
installation of version 17.06 or higher
(either on Debian
or on Ubuntu).
After initializing your work environment with command . .env --yes
,
call ./build.sh debian:buster
to build the package for Debian Buster.
Building for Ubuntu Bionic with ./build.sh ubuntu:bionic
is also supported,
as are the old-stable releases, but those aren't regularly tested and might fail.
See Building Debian Packages in Docker
for more details.
Generated package files are placed in the dist/
directory.
You can upload them to a Debian package repository via e.g. dput
, see
here
for a hassle-free solution that works with Artifactory and Bintray.
To test the resulting package, read the comments at the start of Dockerfile.run
.
Or install the package locally into /opt/venvs/jupyterhub/
, using dpkg -i …
.
sudo dpkg -i $PWD/dist/jupyterhub_*.deb
/usr/sbin/jupyterhub --version # ensure it basically works
To list the installed version of jupyterhub
and all its dependencies
(around 150 in the default configuration), call this:
/opt/venvs/jupyterhub/bin/pip freeze | column
While installing the configurable-http-proxy
Javascript module,
you might get errors like npm ERR! code E403
.
That specific error means you have to provide authorization with your Node.js registry.
npm
uses a configuration file which can provide both
a local registry URL and the credentials for it.
Create a .npmrc
file in the root of your git working directory,
otherwise ~/.npmrc
is used.
Example ‘.npmrc’ file:
_auth = xyzb64…E=
always-auth = true
email = joe.schmoe@example.com
See the related section in the dh-virtualenv manual.
This package needs a reasonably recent pip
for building.
To upgrade pip
(which makes sense anyway if your system is still on the ancient version 1.5.6),
call sudo python3 -m pip install -U pip
.
When using dh-virtualenv 1.1
or later releases, this problem should not appear anymore.
This appears in the service logs (journalctl
) when you use the provided systemd unit files
on older systems (e.g. Xenial). They're just warnings, and can be safely ignored.
As previously mentioned, additional packages are listed in setup.py
.
General dependencies can be found in install_requires
,
while groups of optional extensions are part of extras_require
.
To assist upgrading to newer versions, call these commands in the project workdir:
./setup.py egg_info
pip-upgrade --skip-package-installation --skip-virtualenv-check debianized_jupyterhub.egg-info/requires.txt <<<"q"
This will list any available newer version numbers, that you can then edit into setup.py
.
After installing the package, JupyterHub is launched by default and available at http://127.0.0.1:8000/.
The same is true when you used the docker run
command as mentioned in Dockerfile.run
.
The commands as found in Dockerfile.run
also give you a detailed recipe for a manual install,
when you cannot use Docker for any reason – the only difference is process control, read on for that.
The package contains a systemd
unit for the service, and starting it is done via systemctl
:
sudo systemctl enable jupyterhub
sudo systemctl start jupyterhub
# This should show the service in state "active (running)"
systemctl status 'jupyterhub' | grep -B2 Active:
The service runs as jupyterhub.daemon
.
Note that the jupyterhub
user is not removed when purging the package,
but the /var/{log,opt,run}/jupyterhub
directories and the configuration are.
By default, the sudospawner
is used to start a user's notebook process
– for that purpose, the included /etc/sudoers.d/jupyterhub
configuration
allows the jupyterhub
system user to create these on behalf of any user
listed in the JUPYTER_USERS
alias. Unless you change it, that means
all accounts in the users
group.
In case you want to enable a specific user group for the sudo spawner, change the sudoers file like this:
sed -i.orig~ -e s/%users/%jhub-users/ /etc/sudoers.d/jupyterhub
If you want certain users to have admin access, add them to the set named c.Authenticator.admin_users
in /etc/jupyterhub/jupyterhub_config.py
.
After an upgrade, the service restarts automatically by default
– you can change that using the JUPYTERHUB_AUTO_RESTART
variable in /etc/default/jupyterhub
.
In case of errors or other trouble, look into the service's journal with…
journalctl -eu jupyterhub
To identify your instance, and help users use the right login credentials,
add something similar to this to your /etc/jupyterhub/jupyterhub_config.py
(see this issue for details):
c.JupyterHub.template_vars = dict(
announcement=
'<a href="https://confluence.example.com/x/123456" target="_blank">'
"<h1>DevOps Intelligence Platform</h1></a>",
announcement_login=
'<a href="https://confluence.example.com/x/123456" target="_blank">'
"<h1>DevOps Intelligence Platform</h1></a>",
"<big>🔒 <b>Use your company LDAP credentials!</b></big>",
)
If you add a PNG image at /etc/jupyterhub/banner.png
, it is used instead
of the original banner image (sized 208 × 56 px). Note that this is done
via a postinst
script, so you must call dpkg-reconfigure jupyterhub
if you change or add such an image after the package installation.
Note that JupyterHub can directly offer an SSL endpoint, but there are a few reasons to do that via a local proxy:
- JupyterHub needs no special configuration to open a low port (remember, we do not run it as
root
). - Often there are already configuration management systems in place that, for commodity web servers and proxies, seamlessly handle certificate management and other complexities.
- You can protect sensitive endpoints (e.g. metrics) against unauthorized access using the built-in mechanisms of the chosen SSL off-loader.
To hide the HTTP endpoint from the outside world,
change the bind URL in /etc/default/jupyterhub
as follows:
# Bind to 127.0.0.1 only
sed -i.orig~ -e s~//:8000~//127.0.0.1:8000~ /etc/default/jupyterhub
Restart the service and check that port 8000 is bound to localhost only:
systemctl restart jupyterhub.service
netstat -tulpn | grep :8000
Then install your chosen webserver / proxy for SSL off-loading, listening on port 443 and forwarding to port 8000. Typical candidates are NginX, Apache httpd, or Envoy. For an internet-facing service, consider https-portal, which is a NginX docker image with easy configuration and built-in Let's Encrypt support.
Otherwise, install the Debian nginx-full
package and copy
docs/examples/nginx-jhub.conf
to the /etc/nginx/sites-enabled/default
file (or another path depending on your server setup).
Make sure to read through the file, most likely you have to adapt the certificate paths in
ssl_certificate
and ssl_certificate_key
(and create a certificate, e.g. a self-signed one).
You also need to create Diffie-Hellman parameters using the following command, which can take several minutes to finish:
openssl dhparam -out /etc/ssl/private/dhparam.pem 4096
Then (re-)start the nginx
service and try to login.
The best way to change or augment the configuration of a systemd service
is to use a ‘drop-in’ file.
For example, to increase the limit for open file handles
above the default of 8192, use this in a root
shell:
unit='jupyterhub'
# Change max. number of open files for ‘$unit’…
mkdir -p /etc/systemd/system/$unit.service.d
cat >/etc/systemd/system/$unit.service.d/limits.conf <<'EOF'
[Service]
LimitNOFILE=16384
EOF
systemctl daemon-reload
systemctl restart $unit
# Check that the changes are effective…
systemctl cat $unit
let $(systemctl show $unit -p MainPID)
cat "/proc/$MainPID/limits" | egrep 'Limit|files'
/etc/default/jupyterhub
– Operational parameters like log levels and port bindings./etc/jupyterhub/jupyterhub_config.py
– The service's configuration.
A few configuration parameters are set in the /usr/sbin/jupyterhub-launcher
script
and thus override any values provided by jupyterhub_config.py
.
ℹ️ Please note that the files in /etc/jupyterhub
are not world-readable, since they might contain passwords.
/var/log/jupyterhub
– Extra log files./var/opt/jupyterhub
– Data files created during runtime (jupyterhub_cookie_secret
,jupyterhub.sqlite
, …)./run/jupyterhub
– PID file.
You should stick to these locations, because the maintainer scripts have special handling for them. If you need to relocate, consider using symbolic links to point to the physical location.
These links point to parts of the documentation especially useful for operating a JupyterHub installation.
- Springerle/debianized-pypi-mold – Cookiecutter that was used to create this project.
- Tutorial: Getting Started with JupyterHub
- https://github.com/jupyterhub/the-littlest-jupyterhub
- Notebook culling: jupyterhub/jupyterhub#2032
- https://github.com/jupyterhub/systemdspawner
- As for “Simple sudo rules do not help, since unrestricted access to systemd-run is equivalent to root”, sudo command patterns or a wrapper script could probably fix that.
- Dockerize and Kerberize Notebook for Yarn and HDFS [YouTube]
- bloomberg/jupyterhub-kdcauthenticator – A Kerberos authenticator module for the JupyterHub platform.
- jupyter-incubator/sparkmagic – Jupyter magics and kernels for working with remote Spark clusters.
- Apache Livy – An open source REST interface for interacting with Apache Spark from anywhere.
- jupyter/docker-stacks – Ready-to-run Docker images containing Jupyter applications.
- jupyter/repo2docker – Turn git repositories into Jupyter-enabled Docker Images.
- vatlab/SoS – Workflow system designed for daily data analysis.
- sparklingpandas/sparklingpandas – SparklingPandas builds on Spark's DataFrame class to give you a polished, pythonic, and Pandas-like API.
- data-8/nbzip – Zips and downloads all the contents of a Jupyter notebook.
- data-8/nbgitpuller – One-way git pull with auto-merging, most suited for classroom settings.