Skip to content
This repository has been archived by the owner on Sep 7, 2021. It is now read-only.

Simple packaging for HTML elements #14

Merged
merged 8 commits into from
May 13, 2019
Merged

Conversation

wbamberg
Copy link

@wbamberg wbamberg commented May 8, 2019

This is rough and I'm not sure how far it's intended to be just illustrative versus some rough version of real code we could use somehow. But I hope at least it is illustrative :).

I've talked a bit about the idea of a build step which takes the Stumptown content and emits a structured JSON form that's easy for machines to consume, so they don't have to care about the concessions we make to human writers (e.g. having prose in a single Markdown file).

This PR adds a new package.json command:

npm run package elementName

where "elementName" is the name of an HTML element, like "video" (actually, it's only really tested for "video").

This finds all the bits we have on the given element, assembles them into a blob of JSON, and writes the JSON to a file under /packaging/elementName.json.

The JSON has a structure like this:

{
  "html": {
    "elements": {
      "video": {
        "title": "<video>: The Video Embed element",
        "interactive_example_url": "https://interactive-examples.mdn.mozilla.net/pages/tabbed/video.html",
        "browser_compatibility": {
          ...                       // BCD JSON
        },
        "examples": [
          {
            "description": {        // optional description
              "title":              // example title
              "width":              // iframe width
              "height":             // iframe height
              "content":            // description prose (HTML)
            },
            "sources": {
              "html":                // html source for example
            }
          },
          // potentially more examples
        ],
        "attributes": [
          {
            "name":                  // attribute name (text)
            "description":           // attribute description (HTML)
            "type":                  // attribute type (text)
            "values": [              // optional array of values
              {
                "value":             // value (text)
                "description":       // value description (HTML)
              },
              // potentially more values
            ]
          },
          // potentially more attributes
        ],
        "prose": {
          "additional-sections": [], // optional additional prose sections (HTML)
          "short-description":       // short-description (HTML)
          "overview":                // optional overview (HTML)
          "usage-notes":             // optional usage notes (HTML)
          "accessibility-concerns":  // optional accessibility concerns (HTML)
          "see-also":                // optional see also (HTML)
        },
        "contributors":              // contributor list (HTML)
      }
    }
  }
}

I'm not sure what the best place to integrate something like this would be: I mean, exactly what its inputs and outputs ought to be, and exactly who calls it and when. In this version I'm just writing the JSON to a file, so it's easy for people to look at its output.

It also doesn't validate the structure at all, it just assumes that it's correct. This of course is a thing we would need to add.

One thing I decided, doing this, is that having starting and ending tags for sections of prose.md seems pointlessly complicated, so I changed that for video's prose.md.

@wbamberg
Copy link
Author

wbamberg commented May 8, 2019

@peterbe , @ddbeck , @escattone , I wrote some code that JSONifies stumptown objects.

@ddbeck
Copy link
Contributor

ddbeck commented May 8, 2019

@wbamberg there's a lot here and I can imagine going through this top to bottom (and also I want to try running this locally), but before I do that, is there a particular focus you're looking for in review here?

@@ -0,0 +1,12 @@
const bcd = require('mdn-browser-compat-data');

function package(query) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One first impression: it is long past time for BCD to provide this API. It's wild how many times I've seen this implemented 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbe
Copy link
Contributor

peterbe commented May 8, 2019

It also doesn't validate the structure at all, it just assumes that it's correct. This of course is a thing we would need to add.

I disagree. There should be a separate set of tools that protects the integrity of all the content. I.e. "linters" that are executed in CI and protects from typos in yaml files etc.

@wbamberg
Copy link
Author

wbamberg commented May 8, 2019

is there a particular focus you're looking for in review here?

That's indeed a good question. Starting with the most general stuff, and getting more specific, I'm interested in all of:

  • partly I wrote this to illustrate the idea that transforming Stumptown repo content -> MDN web pages could be a two stage process where the first stage builds this JSON form, which can be consumed by the thing that makes MDN pages (and makes decisions about, for example, how compat tables or lists of attributes should be rendered). Does this seem like a good idea still?

  • as said, I don't have a good idea of how something like this fits into an architecture: what kind of code calls it, and when, and how should it present its output? In this code its output is just JSON written to a file, but perhaps it ought to return a JSON object directly to the thing that calls it. I guess we don't have to worry about these questions yet but it would be good to think about them.

  • now that we can see the kind of hoops code has to go through to transform stumptown->JSON, are we still happy with the stumptown format? For example I changed the way we use HTML comments to demarcate prose.md sections, because when trying to parse it the existing rule seemed pointlessly complicated.

  • relatedly, I've be very happy to hear about better ways to do some of this Markdown-munging work, especially the slicing up of attributes and prose.md files.

  • I'm very happy to hear specific suggestions about how the code could be improved, but I also see this as prototype code at this stage.

Does that help?

@wbamberg
Copy link
Author

wbamberg commented May 8, 2019

It also doesn't validate the structure at all, it just assumes that it's correct. This of course is a thing we would need to add.

I disagree. There should be a separate set of tools that protects the integrity of all the content. I.e. "linters" that are executed in CI and protects from typos in yaml files etc.

Yes, I think I kind of had that in my mind too. I like the idea of separating these two functions, and definitely of linting as early as we can, so the builder can just assume that the content is good.

@ddbeck
Copy link
Contributor

ddbeck commented May 9, 2019

Does that help?

Definitely. That'll really helps me know what to focus on. I'll give this a proper review today.

Copy link
Contributor

@ddbeck ddbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbamberg after spending a lot of time looking at this today, I really like what I see. This feels promising. Detailed stuff below and in line. Thank you for getting this going!

Likes 👍

  • Seeing the JSON, I'm still comfortable with the overall shape of things (I can image some other would-be structures, like selectively accessing bits of the structure—maybe with Graph QL, as in a Gatsby site—but they could reasonably be layers on top of this one). A couple of questions come to mind:

    Is our default posture to provide HTML where possible? For example, I can imagine providing contributor names in a more structured way instead of as an HTML string.

    Should prose sections have an order? This structure implies that our prose files are a bucket of independent sections that can be shown in any order (and without necessarily accompanying other sections). Is that a promise we want to make? It might complicate authoring.

    We don't need answers to these questions now — these could reasonably be converted to issues — but I thought these were significant questions raised by seeing this PR.

  • On architecture: I can totally imagine slotting in some of this content into a web page with a templating engine (e.g., with Jinja or Django templates). It seems pretty obvious how I might experiment with different layouts, run tests on pages that omit or add new blocks of content, and so on. I can even see the start of some additional automation (e.g., defining relationships to other documents and using that to generate sidebars in an MDN site builder). I'm still excited at the prospect of having this.

  • On Stumptown formats and structure: it's good enough. There are things I think we will ultimately find aggravating (e.g., hundreds of prose.md files is going to be a drag, more depth to the file structure than I'd like) but I think we'll just need some minor tweaks, not a radical rethinking.

Dislikes 👎

  • I dislike the packaging terminology. Partly because it seems likely to be confused with other tooling and partly because it covers so much. Some possible terminology that came to mind while reviewing: resolving references (e.g., to BCD or to examples or whatever), rendering Markdown, and composing a document. Specific is better than general, I think.

  • See inline for mostly minor stuff on comments in Markdown, querying BCD, handling contributor data, accepting additional sections, and Markdowns (plural). OK, that last one is not minor.

Again, to reiterate: I'm really happy with the progress you've made. 👌

function package(elementName) {
const elementPath = path.join(process.cwd(), htmlElements, elementName);

if (!fs.exists(elementPath)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got an ERR_INVALID_CALLBACK error trying to run this locally. Using the synchronous call fixed it:

Suggested change
if (!fs.exists(elementPath)) {
if (!fs.existsSync(elementPath)) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, better, leave it to be async exists and use promises. E.g.

async function package(elementName) {
   const elementPath = path.join(process.cwd(), htmlElements, elementName);
   const exists = await fs.exists(elementPath);
    if (!exists) { 
      ...

It would be a shame if this package becomes synchronous. With all the disk IO my gut's telling me it would be faster that way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this blocks mdn/sprints#1499, but agree we should do it, so I have filed #20

@@ -1,9 +1,7 @@
<!-- <short-description> -->
<!-- short-description -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More and more I think comments or custom tags are not going to work for this. For example, it completely foils GitHub's rich Markdown diffs:

Screen Shot 2019-05-09 at 5 24 10 PM

I think sooner or later we'll want to do plain headings (## Short description) and aggressively enforce accepted text and hierarchy (as part of the recipe).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... I'm not sure now why we opted for this plan rather than just using headings. Also I'm not happy that prose.md uses comments while attributes use headings.

So unless someone can remember why we decided to do this, I think you are right.

Again these are things we can tweak as we go along though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to keep this for now but filed #19.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't signify that I've done any kind of thorough review 😃, but I was skimming this PR while waiting for some tests to finish and noticed this thread. I can't remember the details either, but I'm also not happy with the fact that prose.md uses hidden comments to enforce structure while the attributes use headings. It's probably best to choose some specific headings for signifying/enforcing structure, and get rid of the hidden comments. So, for what it's worth, just wanted to say that I agree with you both in where you're heading here! 😄

"gray-matter": "4.0.2",
"jsdom": "^12.2.0",
"js-yaml": "3.13.1",
"marked": "0.6.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not even a fully functioning prototype yet and we've got two Markdown parsers. Some day (but not today) I'm going to have a full on meltdown[1] about this. Choosing a Markdown and an idea of how to not-so-dangerously extend it is something I'd like to see happen sooner rather than later (maybe "propose possible Markdowns" would be a good task for me for a future sprint).

[1] This would be a great name for a Markdown implementation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, we need to do this, and before you have a Markdown meltdown. I have filed mdn/sprints#1505 for this, but feel free to edit it as you see fit. I have made it an epic because it looks like it might be bigger than a single user story. But maybe not.

I hope, again, that getting experience of migrating content will help us decide which features we need.

@@ -0,0 +1,13 @@
const fs = require('fs');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid baking contributor data into the repo, but we'll necessarily have some legacy data to retain. It'd be nice to have a graveyard for legacy contributions (e.g., some separate thing that maps our files to lists of contributors) and derive post-wiki contributions from Git itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I'm working on the assumption that including Wiki contributors is more or less a licensing requirement. Definitely I agree that in the future we should use GitHub for this.

Really, including these .md files was just the quickest way I could think of to ensure that contributor information was present. I agree with your comment above that giving it more structure would be good, but also we should have a more general think about what our obligations are here and how we can best fulfill them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we're deriving content from the existing wiki, then we must attribute the authors. But you're right that this is easy and you shouldn't consider this a blocker to merging.

My thinking here was just that we'll want to—at some point before actually inviting the community to edit Stumptown content—take the authorship data out of the general content files. Once we break from the wiki, the contribution data is historical and Git commits are the canonical source of authorship information. We wouldn't want to update authorship data except via Git from then on, but leaving authorship as part of content files might be confusing to contributors.

For wiki authors who used their GitHub account on MDN, we could conceivably add a commit which makes them contributors to this repo, whether they actually contributed via GitHub or not. This would further shrink the need for attribution data in the repo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll want to...take the authorship data out of the general content files

If we build this content to HTML and start serving it from developer.mozilla.org, I'm not sure what our licensing requirements are. Currently MDN displays all contributors in the page. Is it sufficient to have a link back to GitHub and let people see the contribution history there (plus something for legacy contributors)? I don't know, but will start a conversation about this.

For wiki authors who used their GitHub account on MDN, we could conceivably add a commit which makes them contributors to this repo, whether they actually contributed via GitHub or not. This would further shrink the need for attribution data in the repo.

I seem to remember John had some scheme like this.

if (namedSections.includes(sectionName)) {
sections[sectionName] = sectionContent;
} else {
const additionalSection = {
Copy link
Contributor

@ddbeck ddbeck May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the prototype to break nosily (or at least complain loudly) on sections that aren't in our recipes. Knowing how the structure bends and breaks is really important to minimizing the risk of structuring content. We need to know early if some content is going to be irregular in ways we can anticipate or if we have lots of genuinely exceptional content.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Additional sections" are explicitly allowed in the recipe but we could definitely log them when we encounter them. Is your thinking that if, say, 70% of pages want a section called "Security considerations" then we should put it in the recipe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly. It's my hope that we'll know what sections may exist, even if they're not necessarily used in every instance of that recipe.

We'll definitely have required sections, like Short description. I expect we'll have many recurring but optional sections (Security considerations). We might even have specialized recipes that make some optional sections required or forbidden (e.g., maybe every CSS property that deals with color must have an Accessibility section; maybe it's pointless for Web Crypto APIs to have a Security considerations section). But we'll probably want to reject one-off sections (Security concerns) or at least stuff them someplace that will elicit an antagonistic review (Notes).

But I figure we won't know until we start logging. 🌲

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I don't think we need this for mdn/sprints#1499, and am not sure if this should happen at JSON-build time or at linting time, but have filed #21 for it anyway.

@peterbe
Copy link
Contributor

peterbe commented May 9, 2019

not sure how far it's intended to be just illustrative versus some rough version of real code we could use somehow.

Perhaps I'm overly eager but it feels like we should go for it. What I'd like to see is...

  • Not calling it package but build-html or something.
  • Ability to use the cli to, by default, render ALL elements.
  • Ability to render select few. E.g. node scripts/package/package.js video blockquote abbr to only build the HTML for those 3.

@peterbe peterbe mentioned this pull request May 9, 2019
@wbamberg
Copy link
Author

wbamberg commented May 9, 2019

Lots of great comments, thank you Daniel.

Is our default posture to provide HTML where possible? For example, I can imagine providing contributor names in a more structured way instead of as an HTML string.

No. I think only strictly prose-y things should be delivered as HTML. You're definitely right to call out contributors as a thing that should be better structured. I have more on that in response to your inline comment on this item though.

Should prose sections have an order? This structure implies that our prose files are a bucket of independent sections that can be shown in any order (and without necessarily accompanying other sections). Is that a promise we want to make? It might complicate authoring.

This is a big question, and one that will need more thinking. In stumptown we've tried to make things as natural as possible for writers, and a single "prose.md" is part of that: documenting a thing should feel like writing a holistic document, not like filling in a form.

On the other hand, this is a bit of a lie, because stumptown also treats "prose.md" as a collection of independent pieces of content, and tends to think you can reorder or omit pieces.

I can think of a few different options:

  1. treat prose.md as a single piece of writing, that's always inserted into the document as a single piece.

  2. promise to keep all the sections, and keep the order, but allow the consumer to insert extra pieces. For example, this is needed to let us insert the interactive example between the short description and any additional prose.

  3. not make promises about keeping the ordering.

  4. not make promises about including all the pieces: for example, allow the consumer to extract individual pieces and render them independently. For example, an "accessibility inspector" in the devtools could show just the "accessibility concerns" section of a document. Or an "MDN in the sidebar" could just show overview + BCD.

I think (1) and (2) are fairly similar in terms of authoring constraints, and not allowing the web-page-builder to insert other content seems quite restrictive. So it feels like (2) at least is a thing we would want, and that commits us to being able to slice a document reliably into known sections.

As for (3) and (4), as you say they do impose constraints on authors, and we'd need to decide whether that's worth the use cases that they unlock. I hope we could answer this question in a couple of ways:

  • as we migrate content, if we find a lot of places where pages will become incoherent, that's a bad sign. I've tried migrating about 50 HTML element pages, and not found it to be a big problem yet.

  • as people try editing content, we can see if this looks like an annoying constraint.

But either way, we should be explicit about this and make sure it's understood.

There are things I think we will ultimately find aggravating (e.g., hundreds of prose.md files is going to be a drag,

Yeah, would it be better to call them "video.md", "margin-left.md" etc? Luckily these kinds of changes should not affect people on the other side of the JSON.

I dislike the packaging terminology.

Yeah, that's fair. I have talked about "resolving" sometimes, which makes sense with respect to things like replacing a BCD query with the actual BCD. But "resolve" is quite nonspecific as well. "rendering Markdown" seems to be only part of what it's doing (and it's worth emphasising that in many cases (e.g. BCD) rendering is exactly what it's not doing).

@wbamberg
Copy link
Author

wbamberg commented May 9, 2019

not sure how far it's intended to be just illustrative versus some rough version of real code we could use somehow.

Perhaps I'm overly eager but it feels like we should go for it. What I'd like to see is...

I'm overly eager, too!

  • Not calling it package but build-html or something.

Can I call it build-json at least for now? Just because it's not all HTML.

  • Ability to use the cli to, by default, render ALL elements.

  • Ability to render select few. E.g. node scripts/package/package.js video blockquote abbr to only build the HTML for those 3.

Yes, these are great suggestions. I'll push some updates for this and some of @ddbeck 's suggestions and ask for a proper r+.

@ddbeck
Copy link
Contributor

ddbeck commented May 10, 2019

Thanks for your replies, Will. A few more of my own:

For example, an "accessibility inspector" in the devtools could show just the "accessibility concerns" section of a document. Or an "MDN in the sidebar" could just show overview + BCD.

This sounds like a compelling enough use case to try treating sections in prose.md as independent. Let's roll with it and not add an order to the prose sections unless we can show it's not tenable for authors. And now that I think of it, it's not likely to be an either-or situation. If your experience with HTML elements shows that it works fine, then maybe it'll be fine for all references. Tutorials and other more prose-heavy content was always going to present a different set of problems anyway.

Yeah, would it be better to call them "video.md", "margin-left.md" etc? Luckily these kinds of changes should not affect people on the other side of the JSON.

Yep. In retrospect, this was a nitpick and I should've left it out. An issue to open, not a blocker.

Finally, on terminology: really, I was thinking more about the names of the scripts themselves. The package or build process can be called "packaging" or "build" or whatever. But I was thinking packaging-bcd.js does something rather more specific than package.js does and I like the scripts to be named to reflect that, where possible. Though the BCD thing specifically will be moot eventually.

@wbamberg
Copy link
Author

OK, to summarise. I've addressed only comments which were either super easy or seemed to be needed to achieve mdn/sprints#1499. However, everything else I've filed an issue for (except one thing that I want to discuss internally first).

So @peterbe and @ddbeck, if there's anything else you think needs to be added before we can use it in mdn/sprints#1499, let me know, otherwise I'll merge it.

Copy link
Contributor

@ddbeck ddbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Will!

@wbamberg
Copy link
Author

Thanks for all the reviews. I just pushed a tiny change so we create stuff in /packaged/html/elements/video.

I'll merge this now so I can say I did something today.

It would definitely be nice to fix this up so it works like:

  • start at /content
  • recurse through looking for directories that contain meta.yaml
  • when we find one, look for the recipe property
  • use that to build the appropriate JSON
  • publish it under "packaged" in a parallel hierarchy, like html/elements/video, css/properties/margin etc.

I suspect a lot of the existing content will break when I do that though, so I'll need to update it as well.

@wbamberg wbamberg merged commit 9d67a91 into mdn:master May 13, 2019
@ddbeck ddbeck mentioned this pull request May 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants