-
Notifications
You must be signed in to change notification settings - Fork 17
Conversation
This change will halt the processing of a file if it doesn't have one and only one valid recipe assigned to it. We can't do recipe-based linting for such files, so we should we stop processing at that point.
info boxes (at least in JS property pages) are supposed to be things like the table immediately after the interactive example on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_VALUE. info boxes are not specific to JS properties though, we currently have something like them for HTML elements and CSS properties and probably some other things too. There's some background on the concept here: https://docs.google.com/document/d/1fwbTs6uoeAfGjjZef3cOOzG1Bx-S8sbLbCBjzx9gBPY/edit#heading=h.98r3vvtunbo and here: #106 (comment). In general, they're a collection of name/value facts about a type of thing, suitable for a tabular representation, where each recipe could define the set of facts (so JS properties and one, CSS properties have a different set...). So in this case full testing would be:
But we could also ask:
I think perhaps we should just slurp them for now, and remove them from the recipes... |
), | ||
"prose.see_also": requireTopLevelHeading("prose.see_also", "See_also"), | ||
"prose.short_description": (tree, file) => { | ||
if (select("body > p", tree) === null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too liberal: a page like https://wiki.developer.mozilla.org/en-US/docs/User:wbamberg/test-linter2 passes. I think testing this is tricky: you more or less want the first element of the raw document body to be <p>
, except that sidebars will usually precede this, and pages very occasionally (and wrongly) include the interactive example macro in <p>
instead of <div>
(luckily this is very apparent to humans because it means the example wraps at text-width, instead of getting the full page width).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing something that hopefully does a little better at this
@ddbeck, thanks so much for this, it seems like a big step forward for us! I haven't looked at the code much yet, but had some comments on the functional behaviour and output. I will probably have some more comments and wish I'd had more time to think more about the comments I have so far, but I thought it might be helpful to have comments sooner even if they are half-baked. The output when there are errors is sometimes a bit non-obvious. For example:
This page had an error "Page doesn't match recipe javascript-method". But what was the actual problem with the recipe? Because I know, I know that it's the next two lines "Missing ingredient:". But those lines are listed as warnings, not errors. It might be better to report missing ingredient (or "Missing mandatory ingredient") as an error. Even better perhaps to somehow report that "Page doesn't match recipe javascript-method" isn't a separate error from "Missing mandatory ingredient" - it's essentially the same error. Sometimes the linter could provide more precise error information. For example, it took me a little time to understand why https://wiki.developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Locale/caseFirst was giving me:
...because it has a "Specifications" H2 with a valid-looking table. Actually it's because the section is missing "specname", which the linter knows but doesn't report. As well as
But these aren't content problems, they're pieces of the linter that aren't implemented yet. The linter could actually know this as soon as it looks at the recipe, without looking at the page content at all. So it could report: "I can't check page XYZ, because it reports itself to be javascript-class" and I don't know how to lint with that recipe yet". It seems cruel to warn for including |
Overall, I really like this PR and the mechanics it introduces! Great work! I've used this already to lint some trees and I'm amazed by the output. I think what Will has to say about the error reporting is true:
This was confusing to me at first as well. To resolve "Page doesn't match recipe javascript-method", I need to resolve the subsequent warnings which can be "missing handler", or "missing ingredient" or something else and only then the error "Page doesn't match recipe javascript-method" disappears. I've toyed with adding some trivial handlers for the javascript-error ingredients, and it works well. I think we need to deep dive into the individual handlers and make them good, but I think that can happen as follow-ups where we allow ourselves more time and thinking for each ingredient and its handler. Also, this will also come up when fixing the pages anyway, because either the pages need to be fixed for real or the handlers will need to become smarter about things. So, I think, let's not block on that for the initial landing here. |
if (ingredient in ingredientHandlers) { | ||
ok = ingredientHandlers[ingredient](tree, file) && ok; | ||
} else { | ||
file.message(`No handler for ingredient ${ingredient}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has no message.ruleId
so it displays null
.
Should we add something or are we fine with null
in the report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'm fixing this specifically, but this brought to my attention more issues with ruleId
s so I'm going to mention this in a much longer comment, forthcoming.
It doesn't actual warn any more (it's an error).
Thank you for your helpful reviews, @wbamberg and @Elchi3. I've pushed a bunch of changes and I'm re-requesting a review. Some details on the changes I made, and some specific notes and responses after that. Latest changesLooking at the commit history may help, but here's what I've done by importance:
Notes
|
// Drop any part of the subtree that's after the first H2 | ||
const sectionStart = 0; | ||
const sectionEnd = subTree.children.indexOf(select("h2", subTree)); | ||
subTree.children = subTree.children.slice(sectionStart, sectionEnd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is buggy because if there are no H2 elements, sectionEnd
is -1 so slice()
removes the last element ( and maybe the same at https://github.com/mdn/stumptown-content/pull/320/files#diff-ea5f15cea1cdb44e2b23f4437776a67eR95 ?).
Finding short_description
is tricky and will probably take some tuning. For JS pages, it's probably something like: "there is a <p>
element before the interactive example or an H2". We could also exclude <p class="note"
and <p class="warning"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried tuning the short_description
handler some more and explaining my assumptions in a comment. There might be more ways in which it could be adjusted (is text content in a <div>
OK? right now it's not), but it's pretty complicated as it is. In the interest of preventing this PR from growing larger, if you have any non-blocking feedback on the new short_description
handler, please mention it, but I'll address them in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this quick update @ddbeck , I think this is really great.
I think there are a couple of occurrences of the same bug in the doc-slicing code, and that and some repetition make me think it might be worth factoring out some basic doc-slicing functions.
- You'll notice that the structure of this part of the linter has changed. I moved ingredient handlers to a separate file. Since the plugin machinery is now rather distinct from the way handlers work, I thought it was easier to follow this way.
Definitely. I was going to mention that. I expect the handlers are going to multiply and potentially get more complex, so might end up wanting to live in their own files too.
It's much easier to read like this.
Notes
SpecName
andCompat
errorsThere are now specific errors for pages that have an expected section but lack a macro call (or expected string) for that section. This might be like my former recipe mismatch error: there's an error for a missing ingredient and there's an error for the specific way in which it doesn't have the ingredient. I'm not sure if this is a good idea and I'd like you to try it and give me some feedback for it. 😃
I like this. I think the main things I didn't like before were warnings that were errors in disguise, and no linkage indicated between linked errors, and these are both addressed here:
https://wiki.developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy/handler/enumerate
1:1 error Missing from javascript-method: data.specifications javascript-method/data.specifications html-require-ingredient
1:6 warning Macro: JSRef html-warn-on-macros:jsref
1:16 warning Macro: obsolete_header html-warn-on-macros:obsolete_header
3:87 warning Macro: jsxref html-warn-on-macros:jsxref
28:77 warning Macro: jsxref html-warn-on-macros:jsxref
36:6 warning Macro: jsxref html-warn-on-macros:jsxref
41:69 warning Macro: jsxref html-warn-on-macros:jsxref
49:29 warning Macro: jsxref html-warn-on-macros:jsxref
77:1-77:44 error Expected SpecName macro for javascript-method: data.specifications javascript-method/data.specifications/expected-macro html-require-ingredient
90:6 warning Macro: jsxref html-warn-on-macros:jsxref
91:6 warning Macro: jsxref html-warn-on-macros:jsxref
92:6 warning Macro: jsxref html-warn-on-macros:jsxref
93:6 warning Macro: jsxref html-warn-on-macros:jsxref
Unhandled ingredients
A few ingredients, including
info_box
, are not yet linted. I think we should deal with the implementation in separate PRs.data.info_box
,data.class_constructor
, and landing pages have content considerations that I think go beyond the scope of this PR.
Yeah, I agree. As usual, it's only once the PR is done that I see the scope we should have spelled out in the user story :). There is obviously more work to do here like:
- missing mandatory ingredients
- checking ingredients at a deeper level
- checking optional ingredients
- checking the order of ingredients
But it definitely makes sense to land this and do those as follow-ups.
For example,
data.info_box
is quite different depending on whether you're talking about CSS or JS pages. It's not clear to me that these are, in fact, the same kind of ingredient.In the mean time, you get a warning when an ingredient can't be linted. I experimented with failing upon encountering such items, based on Will's suggestion:
"I can't check page XYZ, because it reports itself to be javascript-class" and I don't know how to lint with that recipe yet"
I didn't like this very much. It meant that other lintable ingredients weren't linted. I could add some logic to skip over unlintable ingredients and then fail, but that seemed complicated when we we'll need to figure out how to deal with those ingredients anyway.
I'm fine with what's here. The main thing is that it's not reported as the content's fault.
"Replace" category macros
Will said:
Now we have an idea which macros we need to "replace", perhaps we should exclude them all from the linter. Out of scope for this PR though I suppose.
This is probably right, but I'd like to have some sort of code to handle such macros before we allow them. The strictest way to do this would be to not allow a macro call until we can scrape them and turn them into Markdown.
OK. I think this makes the output a bit misleading because it's not a content problem to have Compat
or EmbedInteractiveExample
or JSRef
even today. But it doesn't matter either way.
Follow ups needed
Along similar lines, we really need a JSON output. Linting large trees is tough to do in a console. It'd be really great to generate a JSON report and use that to slice the output. We could even generate a web dashboard to share our progress.
Yes, JSON output would be really nice. Also a follow-up of course :).
file.message(`Linting ${ingredient} ingredient is unimplemented`, origin); | ||
}, | ||
"data.browser_compatibility": (tree, file, context) => { | ||
const id = "Browser_compatibility"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
35-59 here is basically identical to 76-96 - the only difference is because of a bug in the second version, as mentioned at https://github.com/mdn/stumptown-content/pull/320/files#r385966999.
This seems like a common thing to want to do: "get the piece of the doc between this H2 and the next one". Factoring it out would help ensure that it's really solid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good call on this. I've pushed a change that factors out slicing sections (and another change that further generalizes the idea, to find the subtree between a node and the first node that meets some condition).
I've pushed a couple more changes, mostly around slicing up sections.
Yes, I've done this. It encouraged me to do something a little more general helps for more than slicing up sections (take a peek at the
I'm feeling this already. I didn't actually do it in this PR because I thought the reorganization and renames would be hard to follow. I'll do this in a (hopefully very easy to review) follow up PR. I've got a lot of follow ups to do. 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this again and I'm happy with the changes made. I think a lot of things can be follow-ups. If you file them as issues, we can schedule them and also assign me or Will and have you review it, so that we learn the code some more in-depth as well. Really excellent work, thank you! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great. Thanks @ddbeck !
Thanks again, @wbamberg and @Elchi3 with your help on this PR. I opened issues for each of the things I mentioned as possible follow ups. Some of them are pretty straight forward (e.g., reorganizing handlers into modules), but a few are kinda gnarly (writing new handlers or linting ingredient order). Feel free to edit the descriptions to fill them out a bit more.
|
Thank you for this Daniel! My vote as the highest priority thing here is JSON output. |
This PR lints scraped pages against its recognized recipe, if it has one.
This ought to satisfy the user story in mdn/sprints#2784. Running this gets you a nice report for each page. You'll learn whether its missing any expected sections or cannot be linted (due to a lack of identifiable recipe). The summary for a tree of pages gives you an overview of how many pages need to be tagged for recipes and how many pages fail to meet the standards set by their recipes. Once this is merged, I can quickly create a spreadsheet for the work needed to complete mdn/sprints#2785.
At least for now, ingredient matching is pretty minimal. For most ingredients, it's a check for a specific
<h2>
. For a couple of ingredients (specification and compat data), this PR demonstrates a more in-depth check: a specific heading paired with a specific macro call. These checks also demonstrate using thehast-util-select
package to use a CSS-like selector to find content.There was at least one kind of ingredient I didn't handle that appeared in JS pages: info boxes. I couldn't find a good example to test against. I'd appreciate pointers about this.
To make this work sensibly, there were a few seemingly off-topic changes I ended up making:
We can now handle
throw
ing VMessages. This way I can signal that a file can be processed no further. In this PR, I use this to handle the case of a file not having a recipe to lint against: there's no meaningful linting that can be done from that point, so we bail out by throwing.(Throwing a VMessage is what
VFile.fail()
does ordinarily, though its API doesn't support exactly how I use VMessages right now. Something I can deal with in a follow-up PR.)I removed
html-require-macros.js
. Now that we can require, for example,compat
macros in context, we don't need the generic rule.There are a few changes I would like to make based on what I learned writing this, but I refrained from putting them into this PR. Some possible follow-up tasks include:
isMacro()
) and I suspect easier to transform into Markdown (when we get there).At this point, I'm interested in feedback on pretty much anything. This took longer than I expected and I think I'm too close to it now to see what I might've done wrong.