-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an api to ignore text in certain tags #1047
Conversation
@crphang |
Let me branch off this and test for inline puml |
👍 , the content in the inline puml should be exposed as a text node to cheerio I wouldn't mind changing the |
8679713
to
a043e1c
Compare
Ready for review, |
a043e1c
to
d1e0a66
Compare
Sorry for the delay in reviewing, will take a deeper look into the PR shortly. For now, I've branched off this PR https://github.com/crphang/markbind/tree/inline-puml-plugin and tested inline puml, works really well 👍 |
No worries; Look forward to the finished version 😄 |
0705280
to
d124b7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Not a complete review, but can i suggest a change to the specialcharacter parsing logic?
diff --git a/src/lib/markbind/src/patches/htmlparser2.js b/src/lib/markbind/src/patches/htmlparser2.js
index 66e308f..8229ee8 100644
--- a/src/lib/markbind/src/patches/htmlparser2.js
+++ b/src/lib/markbind/src/patches/htmlparser2.js
@@ -150,17 +150,8 @@ Tokenizer.prototype.specialTagNames = [
* and initialises the _matchingSpecialTags array with objects containing information about the matches.
*/
Tokenizer.prototype.isFirstCharacterSpecialTagCharacter = function(c) {
- this._matchingSpecialTags = this.specialTagNames
- .map((str, index) => ({
- index,
- nextTestIndex: 0,
- }))
- .filter(indexObj => c.toLowerCase() === this.specialTagNames[indexObj.index][indexObj.nextTestIndex])
- .map(indexObj => ({
- index: indexObj.index,
- nextTestIndex: indexObj.nextTestIndex + 1,
- hasFinishedMatching: this.specialTagNames[indexObj.index][indexObj.nextTestIndex + 1] === undefined,
- }));
+ this._potentialSpecialTag = c;
+ this._matchingSpecialTags = this.specialTagNames.filter(tag => tag[0] === c);
return this._matchingSpecialTags.length > 0;
};
@@ -171,26 +162,18 @@ Tokenizer.prototype.isFirstCharacterSpecialTagCharacter = function(c) {
* If one of the previous matches has finished matching, the corresponding matchIndex is set and returned.
*/
Tokenizer.prototype.processSpecialFunctions = function(c) {
- let matchIndex;
-
+ let matchedTag;
+ this._potentialSpecialTag += c;
this._matchingSpecialTags = this._matchingSpecialTags
- .filter((indexObj) => {
- if (indexObj.hasFinishedMatching) {
- matchIndex = indexObj.index;
-
- return c === "/" || c === ">" || whitespace(c);
+ .filter((tag) => {
+ if (this._potentialSpecialTag === tag) {
+ matchedTag = tag;
+ return true;
}
-
- return c.toLowerCase() === this.specialTagNames[indexObj.index][indexObj.nextTestIndex];
- })
- .map(indexObj => ({
- index: indexObj.index,
- nextTestIndex: indexObj.nextTestIndex + 1,
- hasFinishedMatching: this.specialTagNames[indexObj.index][indexObj.nextTestIndex + 1] === undefined,
- }));
-
+ return tag.startsWith(this._potentialSpecialTag);
+ });
return {
- matchIndex,
+ matchIndex: matchedTag && this.specialTagNames.indexOf(matchedTag),
hasMatching: this._matchingSpecialTags.length > 0,
};
};
Hmm, I went with the same char-by-char approach as htmlparser2 mainly for performance, which is essential for something used throughout markbind. I definitely see some areas in the two functions which can be made simpler after looking at it again though. |
d124b7f
to
d4aa2cf
Compare
Yup this looks easier to understand |
d4aa2cf
to
0839f2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Looks okay in general
* Changes the Tokenizer state to BEFORE_SPECIAL_END if the token matches one of | ||
* the first character of the currently matched special tag. | ||
*/ | ||
Tokenizer.prototype._stateBeforeCloseingTagName = function(c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: closingTagName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a patched method from htmlparser2
* the first character of the currently matched special tag. | ||
*/ | ||
Tokenizer.prototype._stateBeforeCloseingTagName = function(c) { | ||
if (whitespace(c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we do if it is whitespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is from html parser2 as well; Its to allow something like </ tagname>
to work
*/ | ||
Tokenizer.prototype.processSpecialFunctions = function(c) { | ||
const matchingSpecialTags = []; | ||
const numMatchingTags = this._matchingSpecialTags.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Could _matchingSpecialTags
be renamed to _matchingSpecialTagsIndex
? It makes the operation below a little less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
*/ | ||
|
||
// processSpecialFunctions, processSpecialClosingTagCharacter | ||
HAS_MATCHING = -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is possible to follow the previous enumerations for these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed negative numbers here as the positive ones are 'taken' by the indexes
0839f2f
to
1ad18aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1ad18aa
to
0532e54
Compare
Added 'injection' to js-beautify as well ( line 762-764 of site.js ), so the input to special tags appears as is in the browser with no formatting changes ( indentations in particular ); |
@marvinchin could you help with reviewing this? 🙏. Have a few PRs I'm hoping to rebase on this. 🙇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this is some pretty complex stuff! The logic mostly looks good to me, but there are some parts that I was a little confused about, so I've left a couple of questions behind. I'm not too familiar with this part of MarkBind, so it'd be great if you could help fill in the gaps in my understanding so I can give this a more through review.
I think @acjh might also have more context on this, especially with regards to the htmlparser2 patch, so getting his opinion on this would be very valuable.
docs/userGuide/usingPlugins.md
Outdated
You can implement the `getSpecialTags` method to blacklist the content in these special tags from parsing, | ||
removing such potential conflicts. | ||
|
||
- `getSpecialTags(pluginContext)`: Called to get link elements to be added to the head of the page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Called to get link elements to be added to the head of the page.
Is this description correct? :P
src/Site.js
Outdated
const tagsToIgnore = new Set(); | ||
|
||
Object.values(this.plugins).forEach((plugin) => { | ||
if (!plugin.getSpecialTags) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we prefer to wrap the return in curly braces even though it's only a single line:
if (!plugin.getSpecialTags) {
return;
}
src/lib/markbind/src/lib/markdown-it-shared/markdown-it-escape-special-tags.js
Show resolved
Hide resolved
src/lib/markbind/src/lib/markdown-it-shared/markdown-it-escape-special-tags.js
Show resolved
Hide resolved
src/lib/markbind/src/lib/markdown-it-shared/markdown-it-escape-special-tags.js
Show resolved
Hide resolved
src/lib/markbind/src/lib/markdown-it-shared/markdown-it-escape-special-tags.js
Show resolved
Hide resolved
src/lib/markbind/src/lib/markdown-it-shared/markdown-it-escape-special-tags.js
Show resolved
Hide resolved
HAS_MATCHED = -3; | ||
|
||
function whitespace(c) { | ||
return c === " " || c === "\n" || c === "\t" || c === "\f" || c === "\r"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a regex whitespace check instead? I'm just a little concerned about doing an explicit check because there are so many different ways to get whitespace :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a method from htmlparser2
|
||
if (this._matchingSpecialTagIndexes.length > 0) { | ||
this._nextSpecialTagMatchIndex = 1; | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing ;
here
} | ||
|
||
if (this._matchingSpecialTagIndexes.length > 0) { | ||
this._nextSpecialTagMatchIndex = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering - why is this set to 1
? What exactly does the nextSpecialTagMatchIndex
refer to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the next character index to match for the special tags
Here it is 1
since it matched the 0
th character
0532e54
to
9db8d9a
Compare
|
||
/* | ||
Custom patch for the api to escape content in certain special tags | ||
Adapted from the default markdown-it html_block rule and replaces it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, should have really highlighted this line more.
The specific file is \node_modules\markdown-it\lib\rules_block\html_block
75% of this file ( all of the comments ) is essentially the same as it.
I left all the lines from the original file with the exact formatting so its easier to see what’s patched.
The changes here are essentially just using the collected special tags to form a new regex to replace the first one in the original rule.
The htmlparser2 patch has a lot of changes though, 'thankfully' 😅
7acd5d5
to
d74178f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining it - this looks okay to the best of my understanding :P
d74178f
to
9dc8557
Compare
Thanks for looking through this as well! I've made just a few more method name changes in the htmlparser2 patch: Will update back if there are more changes besides rebasing on master |
9dc8557
to
b3091ce
Compare
The parsers MarkBind uses parses content in html tags as html or markdown respectively. This makes it difficult to add custom components that utilize conflicting syntax. Let’s add an interface to ignore content in certain special tags. This interface is also exposed to plugins as the getSpecialTags option.
b3091ce
to
c0fed92
Compare
…nvert-to-code-block * 'master' of https://github.com/MarkBind/markbind: Allow changing parameter properties (MarkBind#1075) Custom timezone for built-in timestamp (MarkBind#1073) Fix reload inconsistency when updating frontmatter (MarkBind#1068) Implement an api to ignore content in certain tags (MarkBind#1047) Enable AppVeyor CI (MarkBind#1040) Add heading and line highlighting to code blocks (MarkBind#1034) Add dividers and fix bug in siteNav (MarkBind#1063) Fixed navbar no longer covers modals (MarkBind#1070) Add copy code-block plugin (MarkBind#1043) Render plugins on dynamic resources (MarkBind#1051) Documentation for Implement no-* attributes for <box> (MarkBind#1042) Migrate to bootstrap-vue popovers (MarkBind#1033) Refactor preprocess and url processing functions (MarkBind#1026) Add pageNav to Using Plugins Page (MarkBind#1062) # Conflicts: # docs/userGuide/syntax/siteNavigationMenus.mbdf
* 'master' of https://github.com/MarkBind/markbind: 2.12.0 Update outdated test files Update vue-strap version to v2.0.1-markbind.37 Fix refactor to processDynamicResources (MarkBind#1092) Implement lazy page building for markbind serve (MarkBind#1038) Add warnings for conflicting/deprecated component attribs (MarkBind#1057) Allow changing parameter properties (MarkBind#1075) Custom timezone for built-in timestamp (MarkBind#1073) Fix reload inconsistency when updating frontmatter (MarkBind#1068) Implement an api to ignore content in certain tags (MarkBind#1047) Enable AppVeyor CI (MarkBind#1040) Add heading and line highlighting to code blocks (MarkBind#1034) Add dividers and fix bug in siteNav (MarkBind#1063) Fixed navbar no longer covers modals (MarkBind#1070) Add copy code-block plugin (MarkBind#1043) Render plugins on dynamic resources (MarkBind#1051) Documentation for Implement no-* attributes for <box> (MarkBind#1042) Migrate to bootstrap-vue popovers (MarkBind#1033) Refactor preprocess and url processing functions (MarkBind#1026) Add pageNav to Using Plugins Page (MarkBind#1062)
The parsers MarkBind uses parses content in html tags as html or markdown respectively. This makes it difficult to add custom components that utilize conflicting syntax. Let’s add an interface to ignore content in certain special tags. This interface is also exposed to plugins as the getSpecialTags option.
What is the purpose of this pull request? (put "X" next to an item, remove the rest)
• [x] New feature
Groundwork for #984
What is the rationale for this request?
To provide a convenient interface, for plugins / core markbind code to blacklist certain tags to ignore in parsers.
This should open the doors to custom components that can have all sorts of strange syntaxes inside them, without using markdown code blocks. e.g.
puml
#968What changes did you make? (Give an overview)
<script/pre/style>
tags are treated in markdown-it currently.Provide some example code that this change will affect:
Is there anything you'd like reviewers to focus on?
#984 (comment) - degree of commonmark compliance we should follow
Testing instructions:
npm run test
should workProposed commit message: (wrap lines at 72 characters)
Implement an api to ignore content in special tags
The parsers MarkBind uses parses content in html tags as html
or markdown respectively.
This makes it difficult to add custom components that utilize
conflicting syntax.
Let’s add an interface to ignore content in certain special tags.
This interface is also exposed to plugins as the getSpecialTags option.