-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-proposal: add extraSchemas
to notebook format
#96
Comments
As discussed in the workshop, we might need to do some more research into what existing standards exist for saying a document must conform to multiple schema: my cursory research check of the JSON schema spec didn't dig up anything (must always be a single URI), but there may be other specs of interest. The first one that came to mind was the widely used (but still maligned for some nits like author order) Dublin Core Metadata, which includes a conformsTo description, but doesn't make many other claims, e.g. "the syntax conforms to," or "the underlying content conforms to." If something authoritative (and already implemented) can't be found, we might also consider just making this a "well-known" Indeed, one of the discussed points was reusing the schema terminology directly, e.g. {
"$schema": ...,
"metadata": {
"extraSchema": {
"allOf": [
{"$ref": "https://some/other/schema"},
]
}
},
"cells": ...
} But, again, this puts us back in an important member being in a list, which has addressability concerns brought up in other places. Another aspect (which didn't come up as much directly in the workshop, as the focus was mostly on the data model) is how various clients would report any schema violations: as the schema could constrain any part of the document (even ones not rendered by a client), which would probably need to be fleshed out. |
@bollwyvl both good points. I think you're recorded as planning to attend the meeting in 10 minutes, so let's discuss it there, and report back the findings! |
i spent a little time thinking about a few tools different kinds of extra schema we could define. these are just some use cases for reference or discussion later on. the schema are written in toml for density. they get weird when we are deep in the schema. specific source patternsconstrain that a document can't be saved with out a blank cell. ideally, we'd want to have a nice "$description" = "require all cells are non-empty"
[properties.cells.items.properties.source.if]
type = "string"
[properties.cells.items.properties.source.then]
"$anchor" = "non-empty-string"
minLength = 1
pattern = "^\s*\S"
[properties.cells.items.properties.source.else]
type = "array"
minLength = 1
contains = {"%ref": "#non-empty-string"} notebook metadata extensionsas @agoose77 described above, we might want to extend the notebook level metadata. in this example, we image kernelspec extracted to its own schema [properties.metadata]
required = ["kernelspec"]
[properties.metadata.properties.kernelspec]
"$ref" = "https://github.com/jupyter/nbformat/blob/main/nbformat/v4/nbformat.kernelspec.v4.5.schema.json" cell metadata extensionswe might want to constrain the cell metadata schema. currently, there are quite a few cell schema that might be useful to extract into more composable representations later on. in this example, slide types are constrained. "$description" = "the cell metadata slide type schema"
[properties.cells.items.properties.metadata]
required = ["slide_type"]
[properties.cells.items.properties.metadata.properties.slide_type]
enum = ["slide", "sub-slide"] display data data extension for a json schemawe might want to constrain our new display data types. this example requires json schema mimetypes to abide json schema. [properties.cells.items.properties.outputs.items.if]
output_type = "display_data"
[properties.cells.items.properties.outputs.items.then.properties.data."application/schema+json"]
"$ref" = "https://json-schema.org/draft/2020-12/schema" display data data metadata extensiona vendor might want to constrain their output metadata. below we constrain [properties.cells.items.properties.outputs.items.if]
output_type = "display_data"
[properties.cells.items.properties.outputs.items.then.properties.metadata.properties.my_extension.properties]
foo = {type = "string"} |
Background
During the Jupyter Notebook workshop, we established three JEP drafts that would prepare the notebook format for additional cell types, and address the problem of un-typed metadata. On the latter issue, current notebook users have no way to indicate to the notebook consumer that metadata should conform to a particular schema. This prevents the validation of the metadata by third parties, and precludes the ability for frontends to display rich-editing interfaces for this metadata1
Proposal
A separate JEP will move to deprecate the
nbformat
andnbformat_minor
top-level properties, in favour of a direct$schema
property. This must contain a URI to an nbformat schema.This JEP will extend the previous schema to include an
extraSchemas
property. This optional property may contain an array of URIs that refer to additional schemas. These schemas may not conflict with one another, and allextraSchemas
must validate the document alongside the root$schema
in order for a notebook to be considered valid. To begin with, any schema inextraSchemas
must conform with a restrictive metaschema that permits the addition of properties only to the notebook and cell metadata. In future, this may be relaxed.Examples
Example of valid notebook under this proposal:
Example of schema referenced in
extraSchemas
("my-extension-schema-uri"
):Further Information
As this is a complex area of discussion (multi-stakeholder, significant long-term impact, niche tooling), we are holding regular, open discussions under the general topic of "extra cell types". The meeting notes from the first of such meetings can be found here. Those wishing to attend can find more information there.
Footnotes
e.g. with tools like react-jsonschema-form ↩
The text was updated successfully, but these errors were encountered: