Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized kernel specs proposal #87

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Jupyter Parameterized Kernel Specs

## Problem

When creating a new kernel, we need to define a different "specs" file for each possible parameter combination used by the executable. For example, the xeus-cling kernel supports three different versions of the `C++` language, `C++11`, `C++14`, and `C++17`, which is specified by the `--std` command-line option.

When installing xeus-cling, we install three almost identical kernel specs. The only difference is the last parameter of the execution command `--std=c++X`.

When more than one parameter is available, the number of possible combinations grows in a combinatorics fashion.

Besides, some kernels would benefit from being configurable when launching them. For example, the connection information to the database could be specified through kernel parameters rather than kernel magics.

Sometimes when installing multiple kernels, the JupyterLab launcher panel is crowded with too many options to create a notebook or console with a specific kernel. The kernels that users see in the launcher are just multiple kernel specs for one kernel. We could avoid having that many kernels displayed, adding parameters to the kernel specs, showing only one option per kernel, and offering a modal dialog with the different options the user can choose from when selecting a specific kernel.

## Proposed Enhancement

The solution we are proposing consists of adding parameters to the kernel specs file in the form of a JSON Schema that would be added to the specs metadata. These parameters are then used to populate the `argv` and `env` lists (respectively the command-line arguments and environment variables).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Pointed out by @vidartf):

the phrase "the argv and env lists" is incorrect because env is a dict. We should fix that sentense.


Upon starting a new kernel instance, a front-end form generated from the JSON schema is prompted to the user to fill the parameter values. Many tools are available to generate such forms, such as react-jsonschema-form.

Some of the chosen parameter values can be saved in e.g. the notebook metadata so that they don't have to be specified every time one opens the notebook.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having stored arguments in the notebook is potentially a security issue. For arbitrary input fields (e.g. any non-enum field) this could potentially be used as a way to inject string, e.g. for shell-injections. So if we want to include this feature it would need to be fleshed out in more detail to ensure it is secure (or at least can be disabled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm talking here about notebooks that can come from untrusted sources, vs general user controlled arguments.


## Detailed Explanation

As described in previous sections, we propose to parameterize the kernel specs file. In the example shown below, we can see the kernel specs file from the kernel xeus-cling. We suggest changing the last parameter of the execution command `-std=c++11` to have a variable `-std=${parameters.cpp_version}` and adding a new object `parameters` to the metadata of the kernel specs.

```=json
{
"display_name": "C++11",
"argv": [
"/home/user/micromamba/envs/kernel_spec/bin/xcpp",
"-f",
"{connection_file}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format for string templates needs to be fleshed out. Above it was listed as ${}, some places as {}. Also, if I need to use curly braces in my command, I need to be able to escape it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to remove the $ from the text. It is clearly just a curly bracket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinRenou mentioned that he might have an example of a case where literal curly braces might be needed, just to highlight what I mean with an escape mechanism being needed, and how it will be strictly a breaking change, but unlikely to actually affect anyone (as long as unrecognized parameter names are left as they are, and don't cause errors).

Copy link
Member

@martinRenou martinRenou Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned I had an example of a kernel spec with ${} in it, but that was my memory tricking me.

Actually we should fix the mention of -std=${cpp_version} line 33 with -std={cpp_version} to stay coherent with the current {connection_file}.

"-std=c++11"
],
env: [
"XEUS_LOGLEVEL=ERROR"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be dicts?

"language": "C++11"
}
```
```=json
{
"display_name": "C++",
"argv": [
"/home/user/micromamba/envs/kernel_spec/bin/xcpp",
"-f",
"{connection_file}",
"-std={parameters.cpp_version}"
],
env: [
"XEUS_LOGLEVEL={parameters.xeus_log_level}"
],
"language": "C++"
"metadata": {
"parameters": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these will include provisioner-relative parameters, I think it would be good to have separate kernel_parameters and provisioner_parameters stanzas - both for end-user applications and programmatic processing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Cdsdashboards does provisioning atop JupyterHub (~BinderHub but with parameters)

https://github.com/ideonate/cdsdashboards/blob/3d6cb7d4f60f82a7d366a8f439a57ab6f2479070/cdsdashboards/builder/processbuilder.py#L46-L91

README:

  • User sees a safe user-friendly version of the original notebook - served by Voilà, Streamlit, Dash, Bokeh, Panel, R Shiny etc.
    All of this works through a new Dashboards menu item added to JupyterHub's header.

Copy link

@westurner westurner Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these will include provisioner-relative parameters, I think it would be good to have separate kernel_parameters and provisioner_parameters stanzas - both for end-user applications and programmatic processing.

It is okay for me to distinguish between kernel and provisioner. Nevertheless, I believe we should not distinguish them because provisioner parameters will get a form for free once we implement the form for choosing kernel parameters.

For example, in JupyterLab, when a user selects the kernel, if there is a custom provisioned, they will see and be able to configure parameters like (CPU, GPU, and memory) from the same form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless, I believe we should not distinguish them because provisioner parameters will get a form for free once we implement the form for choosing kernel parameters.

For example, in JupyterLab, when a user selects the kernel, if there is a custom provisioned, they will see and be able to configure parameters like (CPU, GPU, and memory) from the same form.

I understand, but users should know that certain parameters are relative to the provisioned environment while others are relative to the kernel. In addition, their metadata specifications in the schemas will be separated, so their values should remain separated as well.

If they don't separate them, then users will see completely different sets of parameters for the "same kernel" depending on the provisioner and (I believe) it makes sense logically to separate the two. That said, the UX can choose how these should be organized but at least they'd have that option if the two are separated in their schemas and submission values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we need to give a specific semantics to the new kernel parameters, since they are completely generic and can have very different semantics depending on the use case (language version, connexion parameters, options, or anything else).

"cpp_version": {
"type": "string",
"default": "C++14",
"enum": ["C++11", "C++14", "C++17"],
"save": true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a format_string meta-property so software that launches the kernel can simply build the rest of the argv stanza on the fly? Otherwise, the baked-in argv in the kernel.json dictates what parameters will be used, yet users may want to include others that are defined in the schema. For example, in this example, parameters xeus_log_level will never be included despite the user wanting to enable TRACE output.

Suggested change
"save": true
"save": true,
"format_string": "-std={cpp_version}"

Then, only those parameters that have been provided by the user (or are required) would be included in the finalized argv list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xeus_log_level parameter is used in the environment variable:


env: [
    "XEUS_LOGLEVEL={parameters.xeus_log_level}"
  ],

That's why we are trying to make it as generic as possible so we can use these parameters anywhere.

Maybe the format_string is helpful for flags or optional parameters. I don't have an example right now but imagine a kernel with an optional flag to activate or deactivate LSP (Language Server Protocol) features.

{
  "display_name": "C++",
  "argv": [
      "/home/user/micromamba/envs/kernel_spec/bin/xcpp",
      "-f",
      "{connection_file}",
      "{parameters.lsp}"
  ],
  env: [
    "XEUS_LOGLEVEL={parameters.xeus_log_level}"
  ],
  "language": "C++"
  "metadata": {
    "parameters": {
      "lsp": {
        "type": "string",
        "default": "True",
        "format_string": "--lsp",
        "save": true
      }
    }
  },
}

To be honest, I don't have a strong opinion on this. More people could chime in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xeus_log_level parameter is used in the environment variable:

Ah - I'm sorry, I missed that. I figured environment variables would be classified in a different manner. In that vein,
I think there should be a means of adding environmental variables, free form, and those that are specified in the schema should be classified as environment variables to assist in UX. These are the kinds of things that integrations typically need and we should enable the ability to add any environment variable to the env of the kernel.

},
"xeus_log_level": {
"type": "string",
"default": "ERROR",
"enum": ["TRACE", "DEBUG", "INFO", "WARN", "ERROR", "FATAL"],
"save": true
}
}
},
}
```

The JSON schema specification is augmented with a new `save` attribute used to specify whether the parameter is to be saved in e.g. the notebook metadata for future runs.

Note: Using the JSON Schema, we can automate how front-end forms are created directly from the parameters allowing kernels' authors to decide which parameters are necessary and how to validate them. (Note that JupyterLab already makes use of react-jsonschema-form in other parts of its UI).

In the following screenshots, you can see a small demo of how we envision the UI changes in JupyterLab.

Jupyterlab Launcher | Select c++ version
:-------------------------:|:-------------------------:
![](./launcher.png) | ![](./launcher-select-c-version.png)


![](./notebook-select-kernel.gif)




## Pros and Cons

Pros:

- A less crowded list of kernels specs

Cons:

- Changes are required in multiple components of the stack, from the protocol specification to the front-end.
- Unless we require default values for all parameters, this would be a backward-incompatible change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, at a minimum, we need to require default values for all required parameters and the "source" (i.e., kernel or kernel provisioner) should have a reasonable default for others. Of course, having a default for each parameter would be a "best practice".

Where the default values probably fall down are for things like credentials, but I would argue those kinds of things are probably best retained in a backing store (e.g, configuration file or database), where any inputted values would override the persisted values.

I guess the point is that we don't want to abandon all the jupyter_client-based applications that don't have the ability to present a JSON schema for parameter selection - so some form of reasonable defaulting is necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as possible we need to be backwards compatible with existing kernel spec users so I'm in agreement with @kevin-bates on default values. Maybe it's fine to let the kernel fail if it's missing a parameter (or if the parameter is invalid).

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added jupyter-parameterized-kernel-spec/launcher.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.