-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JEP for adding $schema to notebook format #97
Conversation
|
||
``` | ||
{ | ||
"$schema": "http://json-schema.org/draft-04/schema#", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From our notebook format workshop meeting this morning, we'll need to bump to at least JSON Schema draft 2019 to have the deprecated keyword, and maybe we should bump this to the 2020 draft (i.e., the latest draft)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will also need to introduce support for this new keyword in the existing schemas, i.e. backport the addition to our existing schemas. This should be acceptable, as these schemas are not declared immutable, and it would be a permissive change.
We should also define how to ensure that the nbformat versions align with the document schema during the deprecation period. One solution is to ensure that they're constants in the schema. Thereafter, we could move to a single-version number (major) for each schema revision, as they compatibility is enforced by $schema
itself.
I don't know of a reason not to bump to 2020 draft, besides the risk of existing tooling not supporting newer drafts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say for the newer version it is acceptable to bump the schema to the latest draft after ensuring there is no backward incompatibility between the current version based on draft 04 and the new draft - as far as I know some breaking changes are induced when upgrading from draft 04 to draft 06.
From a quick look it seems ok
Additionally if we are to bump the JSON Schema draft, I would switch all enum
with single value to the new const
as used for the nbformat version numbers (see mainly cell_type
and output_type
).
Can we add what the proposed changes are to the schema that removes the deprecated attributes, to have an idea of what the resolution of the deprecated properties looks like? |
Please consider YAML-LD (JSON-LD) in naming the attribute Given that Linked Data is ideal for science publishing and the internet, as explained and justified by https://5stardata.info/ Eventually, I and I believe also @bollwyvl TODO argue that nbformat should have a JSON-LD Context which would make
Eventually,
That's out of scope for this issue, but FEIW the YAML-LD Convenience Context does map a bunch of things that start with {
"@context": {
"$base": "@base",
"$container": "@container",
"$direction": "@direction",
"$graph": "@graph",
"$id": "@id",
"$import": "@import",
"$included": "@included",
"$index": "@index",
"$json": "@json",
"$language": "@language",
"$list": "@list",
"$nest": "@nest",
"$none": "@none",
"$prefix": "@prefix",
"$propagate": "@propagate",
"$protected": "@protected",
"$reverse": "@reverse",
"$set": "@set",
"$type": "@type",
"$value": "@value",
"$version": "@version",
"$vocab": "@vocab"
}
}
|
@westurner the FWIW, as I understand it, JSON-LD and JSON-Schema are orthogonaly concepts. In this JEP, we're concerned about the validation side of things; down the road, the linked-document properties of LD will be useful. |
wip: update from meeting
currently, the top level notebook schema does not allow for any additionalProperties defined in the container, so we can't have any LD |
change the schema to draft2020-12
@jupyter/software-steering-council we are working on a draft to present to y'all for the JEP. yesterday we were wondering what to expect with the process. is there any way someone can outline what the process will look like so we can plan our work accordingly and set some deadlines? |
This is so much more sensible than the incrementing numbers and awkward compatibility between notebook formats. I'm wholly on board. Thank you all so much for pushing forward with this approach. |
Thanks all for the great discussion.
For my reading there are three opened questions:
And I'm unclear about the following comment of @agoose77 :
Which new keyword are we speaking about? To get validation (from the SSC), the easiest would be to resolve all pending questions and then ping the SSC that this is ready for approval. If some questions are left opened, I would recommend summarizing them in a comment with the possible solutions. Then ping the SSC that will have to figure out how to move forward. |
At least 2019-09. I'd be curious to know whether there are downsides to just jumping straight to 2020-12. See filipsch#2 :)
Yes, I think so.
Yes, I think so.
Actually, this is something I wanted to follow up with @jasongrout on. Due to the fact that we have My understanding of our deprecation process is that we will update I was originally thinking that we would need to backport Going forward, we will in-principle be moving away from a need for major epochs of a schema; we can version the schema by calver (like JSON Schema drafts) if we want to (and without further context, I'd prefer that). To my mind, if we need to be able to upgrade/downgrade notebooks between schema versions, we can do this on a calver-like ordering, i.e. change the API of |
``` | ||
After the deprecation period expires, a future JEP will remove these `nbformat` and `nbformat_minor` properties from the notebook schema. These properties are retained to permit legacy notebook consumers to read notebooks authored during this deprecation period. | ||
|
||
The addition of the `$schema` property removes a level of indirection between the notebook and the schema against which it is invalidated. It also guarantees that the schema against which it is validated is invariant with respect to time; the schema URI should refer to an immutable document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the schema URI should refer to an immutable document.
It would be good to define where the canonical version of the schema document is stored. This may already be defined somewhere else, in which case that information could be referenced here.
While it's not necessary, the could be hosted at the URL represented by the URI. That is, https://jupyter.org/schema/notebook/notebook-{nbformat}.{nbformat_minor}.schema.json
would resolve to the actual schema document. Or, it could be tied a GitHub repo, branch, and tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is something we've touched upon in the meetings. It's my feeling that we haven't wanted to define that in this JEP (to avoid taking on too many responsibilities). As of right now, the schemas used to validate notebooks are stored in the nbformat repository / wheels. I don't think they're hosted on a standalone URL, but I've not checked!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call this a strong need or requirement. But when I see a schema declaration, my instinct is to ask where I can find the actual document and the statement regarding an immutable document reinforces that. If this isn't the right time to define the process for hosting the schema, then I would suggest just documenting the current location as information for the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be tied a GitHub repo, branch, and tag.
jupyter.org
will, in all likelihood, remain under control of Project Jupyter, while GitHub could pull a Docker, Inc., and make things much more challenging.
Further, and not explicitly stated (again to avoid any scope creep): it must not be an expectation that a validating tool will need to (or even be able to) fetch the schema in order to validate it.
While it's my strong feeling the whole family of current and future Jupyter schema should all have an "official", inspectable URL together, with unified tooling for generating human-readable documentation of the schema, generated, lightweight header/typing packages should obviate much of the need for "go grab something off the internet at runtime".
Today, nbformat
publishes canonical packages on pypi.org
and npmjs.com
, which is a great start! But really every language community that wants representation in Jupyter should be able to propose and maintain lightweight packages, and get up-to-date packages.
Here's a quick strawman sketch of what something like that might look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: #107
When [W3C SHACL] validation for Linked Data notebooks becomes the norm (because Linked Data Notebook outputs are most practically validated as Linked Data with Shapes and Constraints), then the (URI-namespaced) property for the version of the SHACL validation document would need to supersede |
Is JSONschema with
When the versioned URI and contents of the
nbformat is older than jsonschema, and may outlast jsonschema draft n, so a separate version string that doesn't change between implementations would be great for backward compatibility |
notebook documents, the serialized version of someone's notebook, is fundamentally incompatible with JSON-LD. we can add
this is a good consideration, as we deprecate
wow, you're right! we've been spending a lot of time discussing backwards compatibility, and how to handle that best. on going work...
a shacl context for notebook schema will undoubtedly show up in the future. the {"@vocab": "https://github.com/jupyter/nbformat/blob/main/nbformat/v4/nbformat.v4.5.schema.json#", "@base": "http://www.w3.org/ns/shacl#](http://www.w3.org/ns/shacl#"} |
@tonyfast I was thinking about this after the meeting, and it seems to me that we should literally define these as constants in the schema. My take is that if you author a notebook with |
change schema draft to draft2020-12
moving agenda minutes over from the team compass. March 7th, 2023
Agendafirst meeting of the notebook cells schema group outside of the nbformat workshop.
to do
|
Name | Affiliation | GitHub |
---|---|---|
tonyfast | @tonyfast | |
Steve Purves | Curvenote | @stevejpurves |
Jason Grout | Databricks | @jasongrout |
Angus Hollands | Princeton University | @agoose77 |
Nick Bollweg | GTech | @bollwyvl |
Agenda
-
discuss open jeps
- Add JEP for adding $schema to notebook format #97
- top level
$schema
is not contentious
- top level
- pre-proposal: add
extraSchemas
to notebook format #96- extra schemas might be contentious
- need to resolve the root notebook schema and extra schemas can fail
- we need to be able to turn things on and off in case of failure
- extra schemas might be contentious
- Add JEP for adding $schema to notebook format #97
-
deprecation notes
- bump the metaschema to draft 2020/12, currently version 4 doesn't support deprecation, it wasn't introduced until draft 2019.
- at least a year for the deprecation. find a good reference for the deprecation cycle as precedent
- old validators will feel when
additionalProperties: false
, which will require updating existing nbformat schema - precedence in nbformat for changes
- $schema takes precedence over nbformat and nbformat_minor
- Require that
$schema
validates against a URI-template that captures major, minor version - Encode this in the schema with
const
- Can also do this in the metaschema, though it's less important.
- Require that
-
what is the jep process
- ask SSC what this process will look like
- software steering council is still being formed. jeps will be priority
-
discuss work in progress
- Text based Format - https://hackmd.io/CmAhY_3tRK6ge4tqANflTg
- Cell's Markdown Format - https://docs.google.com/document/d/1B8mhaHud7DMY55q1mg5sSDhZ96FGC6cbJpypYO1BocA
- Persist user expressions - https://docs.google.com/document/d/
March 21
no meeting
Footnotes
here are the notes from last week. see y'all tomorrow. please add anything you might like to talk about to the agenda. March 28th, 2023
Agenda
|
attaching notes from last week's meeting. see folks tomorrow. April 4rd, 2023
Agenda
|
hey folks. i likely will miss the meeting today. hopefully someone else can drive the ship. the hackmd is all set up https://hackmd.io/@tonyfast/H1Xnx1B12 |
April 25th, 2023
Agenda
|
@/all (but especially @jupyter/software-steering-council) in 0871ad1 I updated the schema URI to align with JEP #108; i.e. from For this particular URI I did not use a subproject (as allowed by the JEP). Let me know if it needs further changes. |
The vote is now closed with the results: In favor: 8 --> In light of those results, this JEP is accepted. |
This JEP proposes to add a new top-level field,
$schema
to the notebook JSON, as such updating the notebook JSON schema. This new field deprecatesnbformat
andnbformat_minor
.I skipped the step of creating a GitHub issue and deciding it's a JEP in this repository after discussing with @fcollonval. There was broad consensus about this change and the fact that it's a JEP during the notebook format workshop held in Paris (Feb 28 - Mar 2), and thought it okay to file a PR straight away. I will be the shepherd.
Voting from @jupyter/software-steering-council