Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernelspec JSON schema #105

Merged
merged 8 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions kernelspec-spec/kernelspec-spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: kernelspec specification
authors: Johan Mabille
issue-number: XX
pr-number: XX
JohanMabille marked this conversation as resolved.
Show resolved Hide resolved
date-started: "2023-04-19"
---

# Specification of the kernelspec

## Problem

The kernelspec configuration file is documented aside the kernel protocol documentation, but it is not
SylvainCorlay marked this conversation as resolved.
Show resolved Hide resolved
*specified*. Besides, it might be unclear whether the kernelspec is part of the kernel protocol, or
independent.

## Proposed Enhancement

We propose to specify the kernelspec with the JSON schema joined in this PR. The specification is conform
JohanMabille marked this conversation as resolved.
Show resolved Hide resolved
to [the current description of the kernelspec](https://jupyter-client.readthedocs.io/en/stable/kernels.html#kernel-specs),
and adds an optional `protocol_version` field.
SylvainCorlay marked this conversation as resolved.
Show resolved Hide resolved

[A dedicated repo](https://github.com/jupyter-standards/kernelspec) for the specification and the documentation of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold yet on repo-per-spec, and think the "big tent" repo, with common tooling and interlinked documentation will be more sustainable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a repo-per-spec might be overkill. However, I would like to have some separation of concerns. The repo for the kernel protocol will also hold the roadmap and additional documentation about the "release" process of the protocol. Also the kernel protocol and the widget protocol are somehow orthogonal, I would find it confusing to have them in the same repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold yet on repo-per-spec, and think the "big tent" repo, with common tooling and interlinked documentation will be more sustainable.

I would agree with this. I think it would be more sustainable and easier for folks to follow if we had a single "Jupyter specs" repo.

Copy link
Member Author

@JohanMabille JohanMabille Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we gather specs by topics or "thematics"? Specifications for the kernel protocol, the kernelspec and the connection file could live together in the kernel_protocol repo. Everything related to the widgets specification (for instance) could live in another one. This way we avoid the multiplication of specs repos, while keeping some separation of concerns. Also this would be consistent with the current situation: a kernel has to implement the kernel protocol, provide a kernel spec and handle a conection file, while supporting the widget protocol is optional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, widgets are maybe an exception... but the knowledge of how comm works is not. At the end of the chain, something like the Jupyter Server API references almost everything each other... kernelspec just happens to be very low in the order, with only perhaps a known-mimetypes.json below it, as that is not a schema-level construct like uri.

When changes occur, often these things will reference each other, and documentation and generated type packages should reflect the compatibility of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How comm works is part of the Kernel protocol, thus I agree the Widgets protocol references the Kernel protocol, but not vice versa. And I took the Widget protocol as an example, but we can consider other things, like the LSP, which is totally unrelated. I see this as a software distribution, we have low layers upon which are built upper layers, and there should not be circular dependencies. The Jupyter Server API you mentioned is a super high layer that references everything, however lower layers should not reference it.

the kernelspec has been created.
Copy link
Member

@krassowski krassowski Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Notebook format, precisely nbformat v4, in metadata there is a kernelspec property, with schema defined here. Previously (in nbformat v3) this was named as kernel_info (side note: format description is outdated as it still mentions kernel_info).

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)? In nbformat v4 language is not required.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)?

I don't know actually, would it make sense to store the arguments used to launch the kernel in the nbformat for instance? If all the mandatory fields of the kernel spec make sense for the notebook format, then we should probably update it so that it refers this new kernelspec. Otherwise, maybe the name could be reverted to kernel_info or something else to avoid confusion? But that would be out of the scope of this JEP.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

The kernelspec is really about describing a kernel, and it is totally orthogonal to the notebook format. Kernel authors should not have to refer to the ntebook format spec to know how to implement a kernelspec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "kernelspec" entry in nbformat.v4 is a misnomer. It is not a kernelspec—sure, it shares some of the same fields, but there is a subtle difference here.

The kernelspec file does not have a "name" field. The (locally) unique name for a kernelspec comes from the directory on disk where the kernel.json file is located—it is not stored in the kernelspec file.

In the nbformat, this "name" field is necessary to locate the kernelspec on disk (this has always been brittle, because it means this information becomes useless the moment the notebook leaves the current context).

It really should be called something like "kernelspec_info".

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)? In nbformat v4 language is not required.

This would be great for notebooks that always run in the same Python/conda/virtual environment (this field isn't helpful for notebooks that move around, but that's a separate issue). I would be in favor of changing this field, but I think this is out-of-scope for this JEP. We should make a follow-up JEP to update nbformat if we'd like to pursue that.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

The kernelspec is really about describing a kernel, and it is totally orthogonal to the notebook format. Kernel authors should not have to refer to the ntebook format spec to know how to implement a kernelspec.

I believe this is out-of-scope for the current JEP.

That said, they aren't entirely orthogonal. Yes, the kernelspec shouldn't depend on the nbformat, but the nbformat loosely depends on the kernelspec today. I think that's what @krassowski is alluding to... if we can drop nbformat's current conflicting definition of the kernelspec and point at this definition, we gain some standardization.


### Impact on existing implementations

None, this JEP only adds an optional field in the kernelspec.

44 changes: 44 additions & 0 deletions kernelspec-spec/kernelspec.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://standards.jupyter.org/kernelspec.schema.json",
JohanMabille marked this conversation as resolved.
Show resolved Hide resolved
"title": "Kernelspec",
JohanMabille marked this conversation as resolved.
Show resolved Hide resolved
"description": "Specification of the kernelspec file",
JohanMabille marked this conversation as resolved.
Show resolved Hide resolved
"type": "object",
"properties": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend some top-level definitions to make individual pieces easier to reference in other schema.

Copy link
Member Author

@JohanMabille JohanMabille Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find a way of gathering pieces, they all appear independent to me. Having a definition per field would just add complexity and not ease their reference in other schemaa, right? Sorry for the naive question, but I'm not used to JSON schemas.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernelspec, and specific fields within it, need to be referenced by other schema.

Have a look at, e.g. nbformat. If the kernelspec is formalized, then conventionally specs would like to point to specific meanings without pointing at a concrete locations:

$ref = "https://schema.jupyter.org/kernelspec/v1/schema.json#/definitions/display-name"

vs

$ref = "https://schema.jupyter.org/kernelspec/v1/schema.json#/properties/display_name"

This becomes more relevant, once more complex patterns are used such as additionalProperties or patternProperties (useful for e.g. mimetypes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation, I will try to make the appropriate changes.

"argv": {
"description": "A list of command line arguments used to start the kernel. The text {connection_file} in any argument will be replaced with the path to the connection file.",
"type": "array",
"items": {
"type": "string"
},
"minItems": 3
},
"display_name": {
"description": "The kernel’s name as it should be displayed in the UI. Unlike the kernel name used in the API, this can contain arbitrary unicode characters.",
"type": "string"
},
"language": {
"description": "The name of the language of the kernel. When loading notebooks, if no matching kernelspec key (may differ across machines) is found, a kernel with a matching language will be used. This allows a notebook written on any Python or Julia kernel to be properly associated with the user’s Python or Julia kernel, even if they aren’t listed under the same name as the author’s.",
"type": "string"
},
"kernel_protocol_version": {
"description": "The version of protocol this kernel implements. If not specified, the client will assume the version is <5.5 until it can get it via the kernel_info request.",
"type": "string"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this stays, should have some pattern. but as mentioned, hoisting this to and $id might be more robust over time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be included in the first version of the kernelspec.

I believe the first published version should only include the spec currently documented in the protocol documentation.

Then, we can quickly publish a second version with kernel_protocol_version added as a key. That way, we don't immediately invalidate everyone kernelspecs.

Also, creates a circular dependency on JEP #66.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way, we don't immediately invalidate everyone kernelspecs.

I realize this actually isn't true, since "kernel_protocol_version" wouldn't be a required field. That said, we still would need to settle the naming in #66 to merge this. I think "kernel_protocol_version" is a fine name for the field, so maybe this isn't any issue.

Copy link
Member Author

@JohanMabille JohanMabille Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually let's say that #66 depends on this one and will be updated accordingly (if needed) when this one is accepted.

},
"interrupt_mode": {
"description": "May be either signal or message and specifies how a client is supposed to interrupt cell execution on this kernel, either by sending an interrupt signal via the operating system’s signalling facilities (e.g. SIGINT on POSIX systems), or by sending an interrupt_request message on the control channel (see Kernel interrupt). If this is not specified the client will default to signal mode.",
"type": "string",
"enume": ["signal", "message"]
JohanMabille marked this conversation as resolved.
Show resolved Hide resolved
},
"env": {
"description": "A dictionary of environment variables to set for the kernel. These will be added to the current environment variables before the kernel is started. Existing environment variables can be referenced using ${<ENV_VAR>} and will be substituted with the corresponding value. Administrators should note that use of ${<ENV_VAR>} can expose sensitive variables and should use only in controlled circumstances.",
"type": "object",
"additionalProperties": {"type": "string" }
},
"metadata": {
"description": "A dictionary of additional attributes about this kernel; used by clients to aid in kernel selection. Metadata added here should be namespaced for the tool reading and writing that metadata.",
"type": "object"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is not widely-used yet, could this already be further constrained to enforce namespacing:

[definitions.metadata]
type = "object"

[definitions.metadata.additionalProperties]
type = "object"

}
},
"required": ["argv", "display_name", "language"]
}