-
Notifications
You must be signed in to change notification settings - Fork 17
Conversation
* upstream/master: Add license Add contributors
@peterbe , @ddbeck , @escattone , I wrote some code that JSONifies stumptown objects. |
@wbamberg there's a lot here and I can imagine going through this top to bottom (and also I want to try running this locally), but before I do that, is there a particular focus you're looking for in review here? |
scripts/package/package-bcd.js
Outdated
@@ -0,0 +1,12 @@ | |||
const bcd = require('mdn-browser-compat-data'); | |||
|
|||
function package(query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One first impression: it is long past time for BCD to provide this API. It's wild how many times I've seen this implemented 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. There should be a separate set of tools that protects the integrity of all the content. I.e. "linters" that are executed in CI and protects from typos in yaml files etc. |
That's indeed a good question. Starting with the most general stuff, and getting more specific, I'm interested in all of:
Does that help? |
Yes, I think I kind of had that in my mind too. I like the idea of separating these two functions, and definitely of linting as early as we can, so the builder can just assume that the content is good. |
Definitely. That'll really helps me know what to focus on. I'll give this a proper review today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wbamberg after spending a lot of time looking at this today, I really like what I see. This feels promising. Detailed stuff below and in line. Thank you for getting this going!
Likes 👍
-
Seeing the JSON, I'm still comfortable with the overall shape of things (I can image some other would-be structures, like selectively accessing bits of the structure—maybe with Graph QL, as in a Gatsby site—but they could reasonably be layers on top of this one). A couple of questions come to mind:
Is our default posture to provide HTML where possible? For example, I can imagine providing contributor names in a more structured way instead of as an HTML string.
Should prose sections have an order? This structure implies that our prose files are a bucket of independent sections that can be shown in any order (and without necessarily accompanying other sections). Is that a promise we want to make? It might complicate authoring.
We don't need answers to these questions now — these could reasonably be converted to issues — but I thought these were significant questions raised by seeing this PR.
-
On architecture: I can totally imagine slotting in some of this content into a web page with a templating engine (e.g., with Jinja or Django templates). It seems pretty obvious how I might experiment with different layouts, run tests on pages that omit or add new blocks of content, and so on. I can even see the start of some additional automation (e.g., defining relationships to other documents and using that to generate sidebars in an MDN site builder). I'm still excited at the prospect of having this.
-
On Stumptown formats and structure: it's good enough. There are things I think we will ultimately find aggravating (e.g., hundreds of
prose.md
files is going to be a drag, more depth to the file structure than I'd like) but I think we'll just need some minor tweaks, not a radical rethinking.
Dislikes 👎
-
I dislike the packaging terminology. Partly because it seems likely to be confused with other tooling and partly because it covers so much. Some possible terminology that came to mind while reviewing: resolving references (e.g., to BCD or to examples or whatever), rendering Markdown, and composing a document. Specific is better than general, I think.
-
See inline for mostly minor stuff on comments in Markdown, querying BCD, handling contributor data, accepting additional sections, and Markdowns (plural). OK, that last one is not minor.
Again, to reiterate: I'm really happy with the progress you've made. 👌
scripts/package/package.js
Outdated
function package(elementName) { | ||
const elementPath = path.join(process.cwd(), htmlElements, elementName); | ||
|
||
if (!fs.exists(elementPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got an ERR_INVALID_CALLBACK
error trying to run this locally. Using the synchronous call fixed it:
if (!fs.exists(elementPath)) { | |
if (!fs.existsSync(elementPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, better, leave it to be async exists and use promises. E.g.
async function package(elementName) {
const elementPath = path.join(process.cwd(), htmlElements, elementName);
const exists = await fs.exists(elementPath);
if (!exists) {
...
It would be a shame if this package becomes synchronous. With all the disk IO my gut's telling me it would be faster that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this blocks mdn/sprints#1499, but agree we should do it, so I have filed #20
@@ -1,9 +1,7 @@ | |||
<!-- <short-description> --> | |||
<!-- short-description --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... I'm not sure now why we opted for this plan rather than just using headings. Also I'm not happy that prose.md uses comments while attributes use headings.
So unless someone can remember why we decided to do this, I think you are right.
Again these are things we can tweak as we go along though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to keep this for now but filed #19.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't signify that I've done any kind of thorough review 😃, but I was skimming this PR while waiting for some tests to finish and noticed this thread. I can't remember the details either, but I'm also not happy with the fact that prose.md
uses hidden comments to enforce structure while the attributes use headings. It's probably best to choose some specific headings for signifying/enforcing structure, and get rid of the hidden comments. So, for what it's worth, just wanted to say that I agree with you both in where you're heading here! 😄
"gray-matter": "4.0.2", | ||
"jsdom": "^12.2.0", | ||
"js-yaml": "3.13.1", | ||
"marked": "0.6.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not even a fully functioning prototype yet and we've got two Markdown parsers. Some day (but not today) I'm going to have a full on meltdown[1] about this. Choosing a Markdown and an idea of how to not-so-dangerously extend it is something I'd like to see happen sooner rather than later (maybe "propose possible Markdowns" would be a good task for me for a future sprint).
[1] This would be a great name for a Markdown implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, we need to do this, and before you have a Markdown meltdown. I have filed mdn/sprints#1505 for this, but feel free to edit it as you see fit. I have made it an epic because it looks like it might be bigger than a single user story. But maybe not.
I hope, again, that getting experience of migrating content will help us decide which features we need.
@@ -0,0 +1,13 @@ | |||
const fs = require('fs'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to avoid baking contributor data into the repo, but we'll necessarily have some legacy data to retain. It'd be nice to have a graveyard for legacy contributions (e.g., some separate thing that maps our files to lists of contributors) and derive post-wiki contributions from Git itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, but I'm working on the assumption that including Wiki contributors is more or less a licensing requirement. Definitely I agree that in the future we should use GitHub for this.
Really, including these .md files was just the quickest way I could think of to ensure that contributor information was present. I agree with your comment above that giving it more structure would be good, but also we should have a more general think about what our obligations are here and how we can best fulfill them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we're deriving content from the existing wiki, then we must attribute the authors. But you're right that this is easy and you shouldn't consider this a blocker to merging.
My thinking here was just that we'll want to—at some point before actually inviting the community to edit Stumptown content—take the authorship data out of the general content files. Once we break from the wiki, the contribution data is historical and Git commits are the canonical source of authorship information. We wouldn't want to update authorship data except via Git from then on, but leaving authorship as part of content files might be confusing to contributors.
For wiki authors who used their GitHub account on MDN, we could conceivably add a commit which makes them contributors to this repo, whether they actually contributed via GitHub or not. This would further shrink the need for attribution data in the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll want to...take the authorship data out of the general content files
If we build this content to HTML and start serving it from developer.mozilla.org, I'm not sure what our licensing requirements are. Currently MDN displays all contributors in the page. Is it sufficient to have a link back to GitHub and let people see the contribution history there (plus something for legacy contributors)? I don't know, but will start a conversation about this.
For wiki authors who used their GitHub account on MDN, we could conceivably add a commit which makes them contributors to this repo, whether they actually contributed via GitHub or not. This would further shrink the need for attribution data in the repo.
I seem to remember John had some scheme like this.
scripts/package/package-prose.js
Outdated
if (namedSections.includes(sectionName)) { | ||
sections[sectionName] = sectionContent; | ||
} else { | ||
const additionalSection = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like the prototype to break nosily (or at least complain loudly) on sections that aren't in our recipes. Knowing how the structure bends and breaks is really important to minimizing the risk of structuring content. We need to know early if some content is going to be irregular in ways we can anticipate or if we have lots of genuinely exceptional content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Additional sections" are explicitly allowed in the recipe but we could definitely log them when we encounter them. Is your thinking that if, say, 70% of pages want a section called "Security considerations" then we should put it in the recipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly. It's my hope that we'll know what sections may exist, even if they're not necessarily used in every instance of that recipe.
We'll definitely have required sections, like Short description. I expect we'll have many recurring but optional sections (Security considerations). We might even have specialized recipes that make some optional sections required or forbidden (e.g., maybe every CSS property that deals with color must have an Accessibility section; maybe it's pointless for Web Crypto APIs to have a Security considerations section). But we'll probably want to reject one-off sections (Security concerns) or at least stuff them someplace that will elicit an antagonistic review (Notes).
But I figure we won't know until we start logging. 🌲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. I don't think we need this for mdn/sprints#1499, and am not sure if this should happen at JSON-build time or at linting time, but have filed #21 for it anyway.
Perhaps I'm overly eager but it feels like we should go for it. What I'd like to see is...
|
Lots of great comments, thank you Daniel.
No. I think only strictly prose-y things should be delivered as HTML. You're definitely right to call out contributors as a thing that should be better structured. I have more on that in response to your inline comment on this item though.
This is a big question, and one that will need more thinking. In stumptown we've tried to make things as natural as possible for writers, and a single "prose.md" is part of that: documenting a thing should feel like writing a holistic document, not like filling in a form. On the other hand, this is a bit of a lie, because stumptown also treats "prose.md" as a collection of independent pieces of content, and tends to think you can reorder or omit pieces. I can think of a few different options:
I think (1) and (2) are fairly similar in terms of authoring constraints, and not allowing the web-page-builder to insert other content seems quite restrictive. So it feels like (2) at least is a thing we would want, and that commits us to being able to slice a document reliably into known sections. As for (3) and (4), as you say they do impose constraints on authors, and we'd need to decide whether that's worth the use cases that they unlock. I hope we could answer this question in a couple of ways:
But either way, we should be explicit about this and make sure it's understood.
Yeah, would it be better to call them "video.md", "margin-left.md" etc? Luckily these kinds of changes should not affect people on the other side of the JSON.
Yeah, that's fair. I have talked about "resolving" sometimes, which makes sense with respect to things like replacing a BCD query with the actual BCD. But "resolve" is quite nonspecific as well. "rendering Markdown" seems to be only part of what it's doing (and it's worth emphasising that in many cases (e.g. BCD) rendering is exactly what it's not doing). |
I'm overly eager, too!
Can I call it
Yes, these are great suggestions. I'll push some updates for this and some of @ddbeck 's suggestions and ask for a proper r+. |
Thanks for your replies, Will. A few more of my own:
This sounds like a compelling enough use case to try treating sections in
Yep. In retrospect, this was a nitpick and I should've left it out. An issue to open, not a blocker. Finally, on terminology: really, I was thinking more about the names of the scripts themselves. The package or build process can be called "packaging" or "build" or whatever. But I was thinking |
OK, to summarise. I've addressed only comments which were either super easy or seemed to be needed to achieve mdn/sprints#1499. However, everything else I've filed an issue for (except one thing that I want to discuss internally first).
So @peterbe and @ddbeck, if there's anything else you think needs to be added before we can use it in mdn/sprints#1499, let me know, otherwise I'll merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Will!
Thanks for all the reviews. I just pushed a tiny change so we create stuff in /packaged/html/elements/video. I'll merge this now so I can say I did something today. It would definitely be nice to fix this up so it works like:
I suspect a lot of the existing content will break when I do that though, so I'll need to update it as well. |
This is rough and I'm not sure how far it's intended to be just illustrative versus some rough version of real code we could use somehow. But I hope at least it is illustrative :).
I've talked a bit about the idea of a build step which takes the Stumptown content and emits a structured JSON form that's easy for machines to consume, so they don't have to care about the concessions we make to human writers (e.g. having prose in a single Markdown file).
This PR adds a new package.json command:
where "elementName" is the name of an HTML element, like "video" (actually, it's only really tested for "video").
This finds all the bits we have on the given element, assembles them into a blob of JSON, and writes the JSON to a file under /packaging/elementName.json.
The JSON has a structure like this:
I'm not sure what the best place to integrate something like this would be: I mean, exactly what its inputs and outputs ought to be, and exactly who calls it and when. In this version I'm just writing the JSON to a file, so it's easy for people to look at its output.
It also doesn't validate the structure at all, it just assumes that it's correct. This of course is a thing we would need to add.
One thing I decided, doing this, is that having starting and ending tags for sections of prose.md seems pointlessly complicated, so I changed that for video's prose.md.