Skip to content

ScootScience/jupyterhub-idle-culler

 
 

Repository files navigation

JupyterHub Idle Culler Service

GitHub Workflow Status - Test Latest PyPI version GitHub Discourse Gitter

jupyterhub-idle-culler provides a JupyterHub service to identify and shut down idle or long-running Jupyter Notebook servers. The exact actions performed are dependent on the used spawner for the Jupyter Notebook server (e.g. the default LocalProcessSpawner, kubespawner, or dockerspawner). In addition, if explicitly requested, all users whose Jupyter Notebook servers have been shut down this way are deleted as JupyterHub users from the internal database. This neither affects the authentication method which continues to allow those users to log in nor does it delete persisted user data (e.g. stored in docker volumes for dockerspawner or in persisted volumes for kubespawner).

Setup

Installation

pip install jupyterhub-idle-culler

As a hub managed service

In jupyterhub_config.py, add the following dictionary for the idle-culler Service to the c.JupyterHub.services list:

c.JupyterHub.services = [
    {
        'name': 'idle-culler',
        'admin': True,
        'command': [
            sys.executable,
            '-m', 'jupyterhub_idle_culler',
            '--timeout=3600'
        ],
    }
]

where:

  • 'admin': True indicates that the Service requires admin permissions so it can shut down arbitrary user notebooks, and
  • 'command' indicates that the Service will be managed by the Hub.

As a standalone script

jupyterhub-idle-culler can also be run as a standalone script. It can access the hub's api with a service token. The service token must have admin privileges.

Generate an API token and store it in a JUPYTERHUB_API_TOKEN environment variable. Then start jupyterhub-idle-culler manually

export JUPYTERHUB_API_TOKEN=$(jupyterhub token)
python3 -m jupyterhub-idle-culler [--timeout=900] [--url=http://localhost:8081/hub/api]

The command line interface also gives a quick overview of the different options for configuration.

  --concurrency                    Limit the number of concurrent requests made
                                   to the Hub.  Deleting a lot of users at the
                                   same time can slow down the Hub, so limit
                                   the number of API requests we have
                                   outstanding at any given time. (default 10)
  --cull-every                     The interval (in seconds) for checking for
                                   idle servers to cull. (default 0)
  --cull-users                     Cull users in addition to servers.  This is
                                   for use in temporary-user cases such as
                                   tmpnb. (default False)
  --internal-certs-location        The location of generated internal-ssl
                                   certificates (only needed with --ssl-
                                   enabled=true). (default internal-ssl)
  --max-age                        The maximum age (in seconds) of servers that
                                   should be culled even if they are active.
                                   (default 0)
  --remove-named-servers           Remove named servers in addition to stopping
                                   them.  This is useful for a BinderHub that
                                   uses authentication and named servers.
                                   (default False)
  --ssl-enabled                    Whether the Jupyter API endpoint has TLS
                                   enabled. (default False)
  --timeout                        The idle timeout (in seconds). (default 600)
  --url                            The JupyterHub API URL.

Caveats

  1. last_activity is not updated with high frequency, so cull timeout should be greater than the sum of:

    • single-user websocket ping interval (default: 30 seconds)
    • JupyterHub.last_activity_interval (default: 5 minutes)
  2. The same --timeout and --max-age values are used to cull users and users' servers. If you want a different value for users and servers, you should add this script to the services list twice, just with different names, different values, and one with the --cull-users option.

  3. By default HTTP requests to the hub timeout after 60 seconds. This can be changed by setting the JUPYTERHUB_REQUEST_TIMEOUT environment variable.

About

JupyterHub service to cull idle servers and users

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%