-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor baseUrl into variablePreprocessor #1239
Refactor baseUrl into variablePreprocessor #1239
Conversation
1242ab1
to
5da86e1
Compare
768c958
to
63d9f82
Compare
@@ -6,4 +6,4 @@ Also, the boilerplate file name (e.g. `inside.md`) and the file that it is suppo | |||
|
|||
This file should behaves as if it is in the `requirements` folder: | |||
|
|||
<panel header="Tested with the folllowing include" src="NonFunctionalRequirements.md" > | |||
<panel header="Tested with the folllowing include" src="NonFunctionalRequirements.md" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was no corresponding closing tag; this results in one of the changes in test_site/index.html
@@ -251,6 +251,7 @@ <h1 id="panel-without-heading-with-keyword">Panel without heading with keyword<a | |||
<h1 id="keyword-should-be-tagged-to-this-heading-not-the-panel-heading">Keyword should be tagged to this heading, not the panel heading<a class="fa fa-anchor" href="#keyword-should-be-tagged-to-this-heading-not-the-panel-heading" onclick="event.stopPropagation()"></a></h1> | |||
<p><span class="keyword">panel keyword</span></p> | |||
</panel> | |||
<p></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This occurs due to html not needing a closing <p>
tag.
Previously, this is the intermediate output after markdown-it renders the html.
It renders invalid html because it is subtly invalid markdown - panel
is not a whitelisted html block tag and cannot terminate the paragraph rule (also resulting in #958):
<p><span class="keyword">panel keyword</span>
</panel>
</p>
when rerendering the html after having passed it through cheerio, the bottom lone </p>
is invalid in xmlMode
, and gets erased. when rendering without xmlMode
however, it gets expanded into an empty p
tag by htmlparser2's forgiving nature (there is no spec for this afaik). (line 226 of htmlparser2's Parser.js)
Ultimately, the extra <p></p>
is due to miswritten markdown;
<span class="keyword">
panel keyword</span>
would be the correct way of not inserting any additional <p>
's at all (note the newline). (I've left this in for now as to explain the many similar cases in the 2103 site)
<p></p> | ||
</panel> | ||
<p> | ||
<panel src="/test_site/requirements/NonFunctionalRequirements._include_.html"><template slot="_header"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to the added /
in the source file
@@ -716,6 +715,7 @@ <h6 class="always-index" id="level-6-header-outside-headingsearchindex-with-alwa | |||
|
|||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to the intermediary endraw
removal, likewise for the above <p></p>
's
@@ -84,6 +76,7 @@ <h2 id="so-far-as-to-comply-with-the-commonmark-spec">So far as to comply with t | |||
*/ | |||
</p> | |||
</abc> | |||
<p></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the two <p></p>
additions in this file occur due to miswritten markdown as well.
In the first case, the /* ...
indicates the start of a paragraph ending after the </abc>
In the second case, the if ...
does the same
* This allows variables and other nunjuck syntax inside {% raw %} {% endraw %} tags | ||
* to be ignored by nunjucks until the final renderString call. | ||
*/ | ||
function preEscapeRawTags(pageData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the real solution to #762 was to make nunjucks a single pass, removing the need for preescaping raw/endraw
. A little unfortunate that the work done here is removed almost as soon as it was introduced #1220 #1049, but we could always revert it in the future if there's ever a need for multiple nunjucks passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crphang could I get your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ccbe682
to
fb1994e
Compare
fb1994e
to
bd4ee73
Compare
rebased on latest, no changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This allows variables and other nunjuck syntax inside {% raw %} {% endraw %} tags | ||
* to be ignored by nunjucks until the final renderString call. | ||
*/ | ||
function preEscapeRawTags(pageData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node.children = cheerio.parseHTML(md.render(cheerio.html(node.children)), true); | ||
cheerio.prototype.options.xmlMode = true; | ||
break; | ||
case 'panel': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to say that because we can resolve the base url in the initial pass, we no longer need to do the intermediate preprocessing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is moved to componentPreprocessor
L149 - L155.
This chunk does the rewriting of src="topic1.html"
into src="topic1._include_.html"
, so it's still necessary
The main difference is the removal of the fileExists
check, which already wasn't necessary before (there's a similar check done in componentPreprocessor
)
The second difference would be path.dirname(path.join('{{hostBaseUrl}}', path.relative(config.rootPath, src)))
using config.baseUrl
directly instead of {{ hostBaseUrl }}
, since we no longer have / render {{ hostBaseUrl }}
config.baseUrlMap); | ||
filePath = path.resolve(parentSitePath, includePath.replace(baseUrlRegex, '')); | ||
if (path.posix.isAbsolute(includePath)) { | ||
// {{ baseUrl }} (or simply '/...' if baseUrl === '') was used in this src attribute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume /...
is /includePath
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, will reword the comment a little
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the multiple nunjucks passes, I looked through the history a little; It seems resolveBaseUrl
was introduced right at the start before any of @acjh and @yamgent's commits unfortunately =C
Realised I focused too much on annotating the strange diffs in the test as well, and none for the code =C. Added some
@@ -1,4 +1,4 @@ | |||
const cheerio = require('cheerio'); | |||
const cheerio = require('cheerio'); require('./lib/markbind/src/patches/htmlparser2'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this in all files to ensure unit tests load the patch, which is more than ever important with the extra 2 "patches" for recognizeSelfClosing
and lowerCaseAttributeNames
*/ | ||
this.variablePreprocessor.addUserDefinedVariable(base, 'baseUrl', '{{baseUrl}}'); | ||
const siteRelativePathFromRoot = utils.ensurePosix(path.relative(this.rootPath, base)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main functional change, the rest are just necessary changes due to the tight coupling of xmlMode
/rendering order/using it to unescape escaped html from removing resolveBaseUrl
We simply add the correct baseUrl for each (sub)site as a variable directly into variablePreprocessor
(previously userDefinedVariablesMap
)
this.variablePreprocessor.addUserDefinedVariable(base, 'MarkBind', MARKBIND_LINK_HTML); | ||
|
||
const $ = cheerio.load(content); | ||
const $ = cheerio.load(content, { decodeEntities: false }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ decodeEntities: false }
here as this file dosen't have the cheerio.prototype.options.decodeEntities = false;
line like the others, and is the only cheerio.load
call in the file
@@ -1,6 +1,5 @@ | |||
module.exports = { | |||
// src/lib/markbind/src/parser.js | |||
ATTRIB_INCLUDE_PATH: 'include-path', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously used for _rebaseReference
const resolvedResourcePath = path.posix.relative(utils.ensurePosix(rootPath), fullResourcePath); | ||
return path.posix.join('{{hostBaseUrl}}', resolvedResourcePath); | ||
const resolvedResourcePath = path.posix.relative(utils.ensurePosix(config.rootPath), fullResourcePath); | ||
return path.posix.join(config.baseUrl || '/', resolvedResourcePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's already rendered at this point, so we need to change it correspondingly to the actual baseUrl
config.baseUrlMap); | ||
filePath = path.resolve(parentSitePath, includePath.replace(baseUrlRegex, '')); | ||
if (path.posix.isAbsolute(includePath)) { | ||
// {{ baseUrl }} (or simply '/...' if baseUrl === '') was used in this src attribute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, will reword the comment a little
@@ -123,39 +120,42 @@ function _getSrcFlagsAndFilePaths(element, context, config) { | |||
* and adds the appropriate include. | |||
*/ | |||
function _preProcessPanel(node, context, config, parser) { | |||
const element = node; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
irrelevant style change =C, let me know if you want to move it to another PR!
element.attribs.src = filePath; | ||
const { fragment } = node.attribs; | ||
const resultPath = utils.setExtension(path.relative(config.rootPath, filePath), '._include_.html'); | ||
const baseUrlPrependedPath = path.posix.join(`${config.baseUrl}/`, utils.ensurePosix(resultPath)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned in the intermediate processing comment, the fileExists
check is done above in _getFileExistsNode
, and {{ hostBaseUrl }}
is replaced with config.baseUrl
}, | ||
renderString(pageData, variableMap = {}, options = {}, env = commonEnv) { | ||
return env.renderString(pageData, variableMap, options); | ||
renderRaw(pageData, variableMap = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little strange to call it renderRaw
now that we don't preescape it, but I'll be touching on this shortly in the next PR that fixes #931
@@ -121,7 +121,7 @@ test('includeFile replaces <include src="doesNotExist.md" optional> with empty < | |||
|
|||
const expected = [ | |||
'# Index', | |||
'<div/>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consequence of using xmlMode
previously; <div></div>
is rendered to <div/>
which is invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I took another look and it looks fine to me! Let's improve some of the surrounding comments and I think we should be good to merge.
} | ||
|
||
element.attribs.src = filePath; | ||
const { fragment } = node.attribs; | ||
const resultPath = utils.setExtension(path.relative(config.rootPath, filePath), '._include_.html'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to rename resultPath
-> relativePath
and baseUrlPrependedPath
-> absolutePath
? I feel like that might better reflect what they are.
BaseUrl processing is done in a separate stage involving repeated and recursive parsing / rendering of the content. This decreases cohesiveness of variable processing, and also performance due to the repeated parsing and rendering. It also necessitates edge-case solutions such as that in MarkBind#1088 when we need to resolve the baseUrl before the resolveBaseUrl stage has been reached. With a framework for variable processing now, let's move baseUrl processing into it, solving the above said problems. Furthermore, rendering of other variables containing html is dependent on the extra htmlparser call in resolveBaseUrl. Let's formally remove the need for this by using only the unescaped nunjucks environment to render variables.
Markbind relies on xmlMode parsing of htmlparser2 to parse 'custom self closing tags' such as '<panel />', and avoid the auto lower case conversion of attribute names. This introduces very tight coupling between the order and variations of this option across html rendering and parsing calls. In addition, xmlMode is not compliant with various html specs, such as opening tags being able to act as closing tags in certain cases, or more common cases like erroneously expanding <br> into <br>...</br>. Let's use htmlparser2's recognizeSelfClosing tag and lowerCaseAttributeName options to do the same instead.
bd4ee73
to
ca8b330
Compare
Thanks for the review again @marvinchin! |
What is the purpose of this pull request? (put "X" next to an item, remove the rest)
• [x] Other, please explain: More refactoring work
Resolves #1236 Resolves #907 as well
What is the rationale for this request?
Less code, moderately (10%) better performance, more cohesion
essentially a smaller follow up to the variable refactoring work, although I ended up needing to resolve the
xmlMode
issues as well (not much changes on the code side though) 😞What changes did you make? (Give an overview)
Moved
baseUrl
rendering intovariablePreprocessor
, removing the need for an entirely separate stage of baseUrl processing which requires edge case baseUrl resolving like in #1088.Is there anything you'd like reviewers to focus on?
the changes in the test output.
there are a few blank
<p></p>
addition / removals; These are expected, and will be annotated.the 2103 site has a lot of similar expected
<p></p>
additions for the same reason, and I've manually verified each of those are non-breaking.Testing instructions:
Proposed commit message: (wrap lines at 72 characters)
Refactor baseUrl into variablePreprocessor
BaseUrl processing is done in a separate stage involving repeated and
recursive parsing / rendering of the content.
This decreases cohesiveness of variable processing, and also
performance due to the repeated parsing and rendering.
With a framework for variable processing now, let's move baseUrl
processing into it, solving the above said problems.
To do this, MarkBind's reliance on xmlMode needs to be removed as well,
as it isvery tightly coupled to the htmlparser call in resolveBaseUrl.
xmlMode is also not compliant with many html specs, such as opening
tags being able to be closed by most other opening tags.
Hence, let's instead patch htmlparser2 to enable it's self closing tag
recognition and disable lower case attribute conversion to achieve
this.
Lastly, rendering of other variables containing html is also dependent
on the extra htmlparser call in resolveBaseUrl which has the
decodeEntities option turned on.
Let's formally remove the need for this by using only the unescaped
nunjucks environment to render variables.