Simple packaging for HTML elements #14

wbamberg · 2019-05-08T03:58:31Z

This is rough and I'm not sure how far it's intended to be just illustrative versus some rough version of real code we could use somehow. But I hope at least it is illustrative :).

I've talked a bit about the idea of a build step which takes the Stumptown content and emits a structured JSON form that's easy for machines to consume, so they don't have to care about the concessions we make to human writers (e.g. having prose in a single Markdown file).

This PR adds a new package.json command:

npm run package elementName

where "elementName" is the name of an HTML element, like "video" (actually, it's only really tested for "video").

This finds all the bits we have on the given element, assembles them into a blob of JSON, and writes the JSON to a file under /packaging/elementName.json.

The JSON has a structure like this:

{
  "html": {
    "elements": {
      "video": {
        "title": "<video>: The Video Embed element",
        "interactive_example_url": "https://interactive-examples.mdn.mozilla.net/pages/tabbed/video.html",
        "browser_compatibility": {
          ...                       // BCD JSON
        },
        "examples": [
          {
            "description": {        // optional description
              "title":              // example title
              "width":              // iframe width
              "height":             // iframe height
              "content":            // description prose (HTML)
            },
            "sources": {
              "html":                // html source for example
            }
          },
          // potentially more examples
        ],
        "attributes": [
          {
            "name":                  // attribute name (text)
            "description":           // attribute description (HTML)
            "type":                  // attribute type (text)
            "values": [              // optional array of values
              {
                "value":             // value (text)
                "description":       // value description (HTML)
              },
              // potentially more values
            ]
          },
          // potentially more attributes
        ],
        "prose": {
          "additional-sections": [], // optional additional prose sections (HTML)
          "short-description":       // short-description (HTML)
          "overview":                // optional overview (HTML)
          "usage-notes":             // optional usage notes (HTML)
          "accessibility-concerns":  // optional accessibility concerns (HTML)
          "see-also":                // optional see also (HTML)
        },
        "contributors":              // contributor list (HTML)
      }
    }
  }
}

I'm not sure what the best place to integrate something like this would be: I mean, exactly what its inputs and outputs ought to be, and exactly who calls it and when. In this version I'm just writing the JSON to a file, so it's easy for people to look at its output.

It also doesn't validate the structure at all, it just assumes that it's correct. This of course is a thing we would need to add.

One thing I decided, doing this, is that having starting and ending tags for sections of prose.md seems pointlessly complicated, so I changed that for video's prose.md.

* upstream/master: Add license Add contributors

wbamberg · 2019-05-08T04:02:43Z

@peterbe , @ddbeck , @escattone , I wrote some code that JSONifies stumptown objects.

ddbeck · 2019-05-08T14:12:08Z

@wbamberg there's a lot here and I can imagine going through this top to bottom (and also I want to try running this locally), but before I do that, is there a particular focus you're looking for in review here?

ddbeck · 2019-05-08T14:16:52Z

scripts/package/package-bcd.js

@@ -0,0 +1,12 @@
+const bcd = require('mdn-browser-compat-data');
+
+function package(query) {


One first impression: it is long past time for BCD to provide this API. It's wild how many times I've seen this implemented 😆

mdn/browser-compat-data#4106

peterbe · 2019-05-08T19:58:06Z

It also doesn't validate the structure at all, it just assumes that it's correct. This of course is a thing we would need to add.

I disagree. There should be a separate set of tools that protects the integrity of all the content. I.e. "linters" that are executed in CI and protects from typos in yaml files etc.

wbamberg · 2019-05-08T23:09:02Z

is there a particular focus you're looking for in review here?

That's indeed a good question. Starting with the most general stuff, and getting more specific, I'm interested in all of:

partly I wrote this to illustrate the idea that transforming Stumptown repo content -> MDN web pages could be a two stage process where the first stage builds this JSON form, which can be consumed by the thing that makes MDN pages (and makes decisions about, for example, how compat tables or lists of attributes should be rendered). Does this seem like a good idea still?
as said, I don't have a good idea of how something like this fits into an architecture: what kind of code calls it, and when, and how should it present its output? In this code its output is just JSON written to a file, but perhaps it ought to return a JSON object directly to the thing that calls it. I guess we don't have to worry about these questions yet but it would be good to think about them.
now that we can see the kind of hoops code has to go through to transform stumptown->JSON, are we still happy with the stumptown format? For example I changed the way we use HTML comments to demarcate prose.md sections, because when trying to parse it the existing rule seemed pointlessly complicated.
relatedly, I've be very happy to hear about better ways to do some of this Markdown-munging work, especially the slicing up of attributes and prose.md files.
I'm very happy to hear specific suggestions about how the code could be improved, but I also see this as prototype code at this stage.

Does that help?

wbamberg · 2019-05-08T23:11:52Z

It also doesn't validate the structure at all, it just assumes that it's correct. This of course is a thing we would need to add.

I disagree. There should be a separate set of tools that protects the integrity of all the content. I.e. "linters" that are executed in CI and protects from typos in yaml files etc.

Yes, I think I kind of had that in my mind too. I like the idea of separating these two functions, and definitely of linting as early as we can, so the builder can just assume that the content is good.

ddbeck · 2019-05-09T09:40:47Z

Does that help?

Definitely. That'll really helps me know what to focus on. I'll give this a proper review today.

ddbeck

@wbamberg after spending a lot of time looking at this today, I really like what I see. This feels promising. Detailed stuff below and in line. Thank you for getting this going!

Likes 👍

Seeing the JSON, I'm still comfortable with the overall shape of things (I can image some other would-be structures, like selectively accessing bits of the structure—maybe with Graph QL, as in a Gatsby site—but they could reasonably be layers on top of this one). A couple of questions come to mind:

Is our default posture to provide HTML where possible? For example, I can imagine providing contributor names in a more structured way instead of as an HTML string.

Should prose sections have an order? This structure implies that our prose files are a bucket of independent sections that can be shown in any order (and without necessarily accompanying other sections). Is that a promise we want to make? It might complicate authoring.

We don't need answers to these questions now — these could reasonably be converted to issues — but I thought these were significant questions raised by seeing this PR.
On architecture: I can totally imagine slotting in some of this content into a web page with a templating engine (e.g., with Jinja or Django templates). It seems pretty obvious how I might experiment with different layouts, run tests on pages that omit or add new blocks of content, and so on. I can even see the start of some additional automation (e.g., defining relationships to other documents and using that to generate sidebars in an MDN site builder). I'm still excited at the prospect of having this.
On Stumptown formats and structure: it's good enough. There are things I think we will ultimately find aggravating (e.g., hundreds of prose.md files is going to be a drag, more depth to the file structure than I'd like) but I think we'll just need some minor tweaks, not a radical rethinking.

Dislikes 👎

I dislike the packaging terminology. Partly because it seems likely to be confused with other tooling and partly because it covers so much. Some possible terminology that came to mind while reviewing: resolving references (e.g., to BCD or to examples or whatever), rendering Markdown, and composing a document. Specific is better than general, I think.
See inline for mostly minor stuff on comments in Markdown, querying BCD, handling contributor data, accepting additional sections, and Markdowns (plural). OK, that last one is not minor.

Again, to reiterate: I'm really happy with the progress you've made. 👌

ddbeck · 2019-05-09T15:17:01Z

scripts/package/package.js

+function package(elementName) {
+    const elementPath = path.join(process.cwd(), htmlElements, elementName);
+
+    if (!fs.exists(elementPath)) {


I got an ERR_INVALID_CALLBACK error trying to run this locally. Using the synchronous call fixed it:

Suggested change

if (!fs.exists(elementPath)) {

if (!fs.existsSync(elementPath)) {

Or, better, leave it to be async exists and use promises. E.g.

async function package(elementName) { const elementPath = path.join(process.cwd(), htmlElements, elementName); const exists = await fs.exists(elementPath); if (!exists) { ...

It would be a shame if this package becomes synchronous. With all the disk IO my gut's telling me it would be faster that way.

I don't think this blocks mdn/sprints#1499, but agree we should do it, so I have filed #20

ddbeck · 2019-05-09T16:25:30Z

content/html/elements/video/prose.md

@@ -1,9 +1,7 @@
-<!-- <short-description> -->
+<!-- short-description -->


More and more I think comments or custom tags are not going to work for this. For example, it completely foils GitHub's rich Markdown diffs:

I think sooner or later we'll want to do plain headings (## Short description) and aggressively enforce accepted text and hierarchy (as part of the recipe).

Yes... I'm not sure now why we opted for this plan rather than just using headings. Also I'm not happy that prose.md uses comments while attributes use headings.

So unless someone can remember why we decided to do this, I think you are right.

Again these are things we can tweak as we go along though.

I'm going to keep this for now but filed #19.

This comment doesn't signify that I've done any kind of thorough review 😃, but I was skimming this PR while waiting for some tests to finish and noticed this thread. I can't remember the details either, but I'm also not happy with the fact that prose.md uses hidden comments to enforce structure while the attributes use headings. It's probably best to choose some specific headings for signifying/enforcing structure, and get rid of the hidden comments. So, for what it's worth, just wanted to say that I agree with you both in where you're heading here! 😄

ddbeck · 2019-05-09T16:27:20Z

package.json

+        "gray-matter": "4.0.2",
+        "jsdom": "^12.2.0",
+        "js-yaml": "3.13.1",
+        "marked": "0.6.2",


Not even a fully functioning prototype yet and we've got two Markdown parsers. Some day (but not today) I'm going to have a full on meltdown[1] about this. Choosing a Markdown and an idea of how to not-so-dangerously extend it is something I'd like to see happen sooner rather than later (maybe "propose possible Markdowns" would be a good task for me for a future sprint).

[1] This would be a great name for a Markdown implementation

Definitely, we need to do this, and before you have a Markdown meltdown. I have filed mdn/sprints#1505 for this, but feel free to edit it as you see fit. I have made it an epic because it looks like it might be bigger than a single user story. But maybe not.

I hope, again, that getting experience of migrating content will help us decide which features we need.

ddbeck · 2019-05-09T16:53:46Z

scripts/package/package-contributors.js

@@ -0,0 +1,13 @@
+const fs = require('fs');


I'd like to avoid baking contributor data into the repo, but we'll necessarily have some legacy data to retain. It'd be nice to have a graveyard for legacy contributions (e.g., some separate thing that maps our files to lists of contributors) and derive post-wiki contributions from Git itself.

I could be wrong, but I'm working on the assumption that including Wiki contributors is more or less a licensing requirement. Definitely I agree that in the future we should use GitHub for this.

Really, including these .md files was just the quickest way I could think of to ensure that contributor information was present. I agree with your comment above that giving it more structure would be good, but also we should have a more general think about what our obligations are here and how we can best fulfill them.

Yes, if we're deriving content from the existing wiki, then we must attribute the authors. But you're right that this is easy and you shouldn't consider this a blocker to merging.

My thinking here was just that we'll want to—at some point before actually inviting the community to edit Stumptown content—take the authorship data out of the general content files. Once we break from the wiki, the contribution data is historical and Git commits are the canonical source of authorship information. We wouldn't want to update authorship data except via Git from then on, but leaving authorship as part of content files might be confusing to contributors.

For wiki authors who used their GitHub account on MDN, we could conceivably add a commit which makes them contributors to this repo, whether they actually contributed via GitHub or not. This would further shrink the need for attribution data in the repo.

we'll want to...take the authorship data out of the general content files

If we build this content to HTML and start serving it from developer.mozilla.org, I'm not sure what our licensing requirements are. Currently MDN displays all contributors in the page. Is it sufficient to have a link back to GitHub and let people see the contribution history there (plus something for legacy contributors)? I don't know, but will start a conversation about this.

For wiki authors who used their GitHub account on MDN, we could conceivably add a commit which makes them contributors to this repo, whether they actually contributed via GitHub or not. This would further shrink the need for attribution data in the repo.

I seem to remember John had some scheme like this.

ddbeck · 2019-05-09T16:58:10Z

scripts/package/package-prose.js

+    if (namedSections.includes(sectionName)) {
+        sections[sectionName] = sectionContent;
+    } else {
+        const additionalSection = {


I'd like the prototype to break nosily (or at least complain loudly) on sections that aren't in our recipes. Knowing how the structure bends and breaks is really important to minimizing the risk of structuring content. We need to know early if some content is going to be irregular in ways we can anticipate or if we have lots of genuinely exceptional content.

"Additional sections" are explicitly allowed in the recipe but we could definitely log them when we encounter them. Is your thinking that if, say, 70% of pages want a section called "Security considerations" then we should put it in the recipe?

Yeah, exactly. It's my hope that we'll know what sections may exist, even if they're not necessarily used in every instance of that recipe.

We'll definitely have required sections, like Short description. I expect we'll have many recurring but optional sections (Security considerations). We might even have specialized recipes that make some optional sections required or forbidden (e.g., maybe every CSS property that deals with color must have an Accessibility section; maybe it's pointless for Web Crypto APIs to have a Security considerations section). But we'll probably want to reject one-off sections (Security concerns) or at least stuff them someplace that will elicit an antagonistic review (Notes).

But I figure we won't know until we start logging. 🌲

Sure thing. I don't think we need this for mdn/sprints#1499, and am not sure if this should happen at JSON-build time or at linting time, but have filed #21 for it anyway.

peterbe · 2019-05-09T20:16:49Z

not sure how far it's intended to be just illustrative versus some rough version of real code we could use somehow.

Perhaps I'm overly eager but it feels like we should go for it. What I'd like to see is...

Not calling it package but build-html or something.
Ability to use the cli to, by default, render ALL elements.
Ability to render select few. E.g. node scripts/package/package.js video blockquote abbr to only build the HTML for those 3.

wbamberg · 2019-05-09T21:15:08Z

Lots of great comments, thank you Daniel.

Is our default posture to provide HTML where possible? For example, I can imagine providing contributor names in a more structured way instead of as an HTML string.

No. I think only strictly prose-y things should be delivered as HTML. You're definitely right to call out contributors as a thing that should be better structured. I have more on that in response to your inline comment on this item though.

Should prose sections have an order? This structure implies that our prose files are a bucket of independent sections that can be shown in any order (and without necessarily accompanying other sections). Is that a promise we want to make? It might complicate authoring.

This is a big question, and one that will need more thinking. In stumptown we've tried to make things as natural as possible for writers, and a single "prose.md" is part of that: documenting a thing should feel like writing a holistic document, not like filling in a form.

On the other hand, this is a bit of a lie, because stumptown also treats "prose.md" as a collection of independent pieces of content, and tends to think you can reorder or omit pieces.

I can think of a few different options:

treat prose.md as a single piece of writing, that's always inserted into the document as a single piece.
promise to keep all the sections, and keep the order, but allow the consumer to insert extra pieces. For example, this is needed to let us insert the interactive example between the short description and any additional prose.
not make promises about keeping the ordering.
not make promises about including all the pieces: for example, allow the consumer to extract individual pieces and render them independently. For example, an "accessibility inspector" in the devtools could show just the "accessibility concerns" section of a document. Or an "MDN in the sidebar" could just show overview + BCD.

I think (1) and (2) are fairly similar in terms of authoring constraints, and not allowing the web-page-builder to insert other content seems quite restrictive. So it feels like (2) at least is a thing we would want, and that commits us to being able to slice a document reliably into known sections.

As for (3) and (4), as you say they do impose constraints on authors, and we'd need to decide whether that's worth the use cases that they unlock. I hope we could answer this question in a couple of ways:

as we migrate content, if we find a lot of places where pages will become incoherent, that's a bad sign. I've tried migrating about 50 HTML element pages, and not found it to be a big problem yet.
as people try editing content, we can see if this looks like an annoying constraint.

But either way, we should be explicit about this and make sure it's understood.

There are things I think we will ultimately find aggravating (e.g., hundreds of prose.md files is going to be a drag,

Yeah, would it be better to call them "video.md", "margin-left.md" etc? Luckily these kinds of changes should not affect people on the other side of the JSON.

I dislike the packaging terminology.

Yeah, that's fair. I have talked about "resolving" sometimes, which makes sense with respect to things like replacing a BCD query with the actual BCD. But "resolve" is quite nonspecific as well. "rendering Markdown" seems to be only part of what it's doing (and it's worth emphasising that in many cases (e.g. BCD) rendering is exactly what it's not doing).

wbamberg · 2019-05-09T21:54:49Z

not sure how far it's intended to be just illustrative versus some rough version of real code we could use somehow.

Perhaps I'm overly eager but it feels like we should go for it. What I'd like to see is...

I'm overly eager, too!

Not calling it package but build-html or something.

Can I call it build-json at least for now? Just because it's not all HTML.

Ability to use the cli to, by default, render ALL elements.

Ability to render select few. E.g. node scripts/package/package.js video blockquote abbr to only build the HTML for those 3.

Yes, these are great suggestions. I'll push some updates for this and some of @ddbeck 's suggestions and ask for a proper r+.

ddbeck · 2019-05-10T11:01:31Z

Thanks for your replies, Will. A few more of my own:

For example, an "accessibility inspector" in the devtools could show just the "accessibility concerns" section of a document. Or an "MDN in the sidebar" could just show overview + BCD.

This sounds like a compelling enough use case to try treating sections in prose.md as independent. Let's roll with it and not add an order to the prose sections unless we can show it's not tenable for authors. And now that I think of it, it's not likely to be an either-or situation. If your experience with HTML elements shows that it works fine, then maybe it'll be fine for all references. Tutorials and other more prose-heavy content was always going to present a different set of problems anyway.

Yeah, would it be better to call them "video.md", "margin-left.md" etc? Luckily these kinds of changes should not affect people on the other side of the JSON.

Yep. In retrospect, this was a nitpick and I should've left it out. An issue to open, not a blocker.

Finally, on terminology: really, I was thinking more about the names of the scripts themselves. The package or build process can be called "packaging" or "build" or whatever. But I was thinking packaging-bcd.js does something rather more specific than package.js does and I like the scripts to be named to reflect that, where possible. Though the BCD thing specifically will be moot eventually.

wbamberg · 2019-05-10T22:29:53Z

OK, to summarise. I've addressed only comments which were either super easy or seemed to be needed to achieve mdn/sprints#1499. However, everything else I've filed an issue for (except one thing that I want to discuss internally first).

Names of scripts: Simple packaging for HTML elements #14 (review) and Simple packaging for HTML elements #14 (comment). I have had a go at some renaming. The main script it called "build-json" now and the others have more creative names.
fs.exists() bug: Simple packaging for HTML elements #14 (comment). I've pushed the quick fix (use existsSync() instead. I've filed Make packaging JSON asynchronous #20, to make the whole thing async, as a follow-up.
HTML comments versus headings as prose section demarcation: Simple packaging for HTML elements #14 (comment). I've filed Use headings rather than HTML comments for prose.md sectioning? #19 to see if anyone can remember why we did this.
Markdown choices: Simple packaging for HTML elements #14 (comment). I've filed Choose a Markdown format for GitHub-hosted content sprints#1505 for this.
contributor data: Simple packaging for HTML elements #14 (comment). I'd like us to talk about this some more and bring others into it.
logging for additional sections: Simple packaging for HTML elements #14 (comment). I've filed Log "additional sections" #21 to add this in a follow-up.
enabling us to build more than one element at a time: Simple packaging for HTML elements #14 (comment). I've filed Make JSON builder able to build more than one item #22.

So @peterbe and @ddbeck, if there's anything else you think needs to be added before we can use it in mdn/sprints#1499, let me know, otherwise I'll merge it.

ddbeck

Thank you, Will!

wbamberg · 2019-05-13T21:46:02Z

Thanks for all the reviews. I just pushed a tiny change so we create stuff in /packaged/html/elements/video.

I'll merge this now so I can say I did something today.

It would definitely be nice to fix this up so it works like:

start at /content
recurse through looking for directories that contain meta.yaml
when we find one, look for the recipe property
use that to build the appropriate JSON
publish it under "packaged" in a parallel hierarchy, like html/elements/video, css/properties/margin etc.

I suspect a lot of the existing content will break when I do that though, so I'll need to update it as well.

Will Bamberg added 3 commits May 6, 2019 15:13

Add jsdom and BCD dependencies, upgrade npm

6752cfe

Merge remote-tracking branch 'upstream/master' into packaging

e5aea4f

* upstream/master: Add license Add contributors

Add simple packager

a1710dd

ddbeck reviewed May 8, 2019

View reviewed changes

Will Bamberg added 2 commits May 8, 2019 15:42

Remove logging

8f86228

Add a check that the given element exists

8e7bc0b

ddbeck reviewed May 9, 2019

View reviewed changes

peterbe mentioned this pull request May 9, 2019

Remove the Python code? #16

Closed

This was referenced May 10, 2019

Use headings rather than HTML comments for prose.md sectioning? #19

Closed

Make packaging JSON asynchronous #20

Closed

Use existsSync rather than exists

a4802d6

wbamberg mentioned this pull request May 10, 2019

Log "additional sections" #21

Open

Rename scripts

418f3ef

wbamberg mentioned this pull request May 10, 2019

Make JSON builder able to build more than one item #22

Closed

ddbeck approved these changes May 13, 2019

View reviewed changes

wbamberg mentioned this pull request May 13, 2019

Build JSON out of structured content mdn/sprints#1501

Closed

Create output in nested directory

46ad957

wbamberg merged commit 9d67a91 into mdn:master May 13, 2019

ddbeck mentioned this pull request May 22, 2019

Expose recipe #27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple packaging for HTML elements #14

Simple packaging for HTML elements #14

wbamberg commented May 8, 2019 •

edited

Loading

wbamberg commented May 8, 2019

ddbeck commented May 8, 2019

ddbeck May 8, 2019

ddbeck May 8, 2019

peterbe commented May 8, 2019

wbamberg commented May 8, 2019

wbamberg commented May 8, 2019

ddbeck commented May 9, 2019

ddbeck left a comment

ddbeck May 9, 2019

peterbe May 10, 2019

wbamberg May 10, 2019

ddbeck May 9, 2019

wbamberg May 9, 2019

wbamberg May 10, 2019

escattone May 10, 2019

ddbeck May 9, 2019

wbamberg May 9, 2019

ddbeck May 9, 2019

wbamberg May 9, 2019

ddbeck May 10, 2019

wbamberg May 10, 2019

ddbeck May 9, 2019 •

edited

Loading

wbamberg May 9, 2019

ddbeck May 10, 2019

wbamberg May 10, 2019

peterbe commented May 9, 2019

wbamberg commented May 9, 2019

wbamberg commented May 9, 2019 •

edited

Loading

ddbeck commented May 10, 2019

wbamberg commented May 10, 2019

ddbeck left a comment

wbamberg commented May 13, 2019

		@@ -0,0 +1,12 @@
		const bcd = require('mdn-browser-compat-data');

		function package(query) {

	if (!fs.exists(elementPath)) {
	if (!fs.existsSync(elementPath)) {

		@@ -1,9 +1,7 @@
		<!-- <short-description> -->
		<!-- short-description -->

Simple packaging for HTML elements #14

Simple packaging for HTML elements #14

Conversation

wbamberg commented May 8, 2019 • edited Loading

wbamberg commented May 8, 2019

ddbeck commented May 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterbe commented May 8, 2019

wbamberg commented May 8, 2019

wbamberg commented May 8, 2019

ddbeck commented May 9, 2019

ddbeck left a comment

Choose a reason for hiding this comment

Likes 👍

Dislikes 👎

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddbeck May 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterbe commented May 9, 2019

wbamberg commented May 9, 2019

wbamberg commented May 9, 2019 • edited Loading

ddbeck commented May 10, 2019

wbamberg commented May 10, 2019

ddbeck left a comment

Choose a reason for hiding this comment

wbamberg commented May 13, 2019

wbamberg commented May 8, 2019 •

edited

Loading

ddbeck May 9, 2019 •

edited

Loading

wbamberg commented May 9, 2019 •

edited

Loading