Implement an api to ignore text in certain tags #1047

ang-zeyu · 2020-02-17T08:41:18Z

What is the purpose of this pull request? (put "X" next to an item, remove the rest)

• [x] New feature

Groundwork for #984

What is the rationale for this request?
To provide a convenient interface, for plugins / core markbind code to blacklist certain tags to ignore in parsers.

This should open the doors to custom components that can have all sorts of strange syntaxes inside them, without using markdown code blocks. e.g. puml #968

What changes did you make? (Give an overview)

Patch an existing rule in markdown-it, exposing an interface that allows injecting tags to ignore before registering the patch. The patch allows these additional tags to be treated the same way <script/pre/style> tags are treated in markdown-it currently.
Patch many of htmlparser2 state handlers and added some state variables to the tokenizer, exposing a similar interface.

Provide some example code that this change will affect:

function injectIgnoreTags(tagsToIgnore) {
	Tokenizer.prototype.specialTagNames = Tokenizer.prototype.specialTagNames.concat(...tagsToIgnore)
}

Is there anything you'd like reviewers to focus on?
#984 (comment) - degree of commonmark compliance we should follow

Testing instructions:

npm run test should work
rebuilding the 2103 site should show no diffs, save for timestamps and log output

Proposed commit message: (wrap lines at 72 characters)
Implement an api to ignore content in special tags

The parsers MarkBind uses parses content in html tags as html
or markdown respectively.
This makes it difficult to add custom components that utilize
conflicting syntax.

Let’s add an interface to ignore content in certain special tags.
This interface is also exposed to plugins as the getSpecialTags option.

ang-zeyu · 2020-02-17T08:47:05Z

@crphang
The interface is exposed through the two inject methods, called under collectPluginSpecialTags once during Site generation.
Plugins implement the getSpecialTags: () => [ 'tagnames' , '...', ...] method to provide these tag names.
Let me know if this works for what you have in mind, thanks!

crphang · 2020-02-17T08:49:13Z

Let me branch off this and test for inline puml

ang-zeyu · 2020-02-17T08:51:19Z

Let me branch off this and test for inline puml.

👍 , the content in the inline puml should be exposed as a text node to cheerio

I wouldn't mind changing the getSpecialTags implementation if needed for your work on widgets as well, but the two inject methods ( markdown-it and htmlparser2 ) should essentially be widget-plugin independent. Going forward I'll maintain the format of these two methods while fixing more bugs / tweaking the implementation if needed.

ang-zeyu · 2020-02-21T12:03:34Z

Let me branch off this and test for inline puml

Ready for review,
Not much changed from the first commit apart from adding a functional test and cleaning some things up.

crphang · 2020-02-23T10:06:43Z

Sorry for the delay in reviewing, will take a deeper look into the PR shortly.

For now, I've branched off this PR https://github.com/crphang/markbind/tree/inline-puml-plugin and tested inline puml, works really well 👍

ang-zeyu · 2020-02-23T14:35:51Z

Sorry for the delay in reviewing, will take a deeper look into the PR shortly.

For now, I've branched off this PR https://github.com/crphang/markbind/tree/inline-puml-plugin and tested inline puml, works really well 👍

No worries; Look forward to the finished version 😄

openorclose

Nice work!

Not a complete review, but can i suggest a change to the specialcharacter parsing logic?

https://github.com/openorclose/markbind/blob/special-tags-api/src/lib/markbind/src/patches/htmlparser2.js

diff --git a/src/lib/markbind/src/patches/htmlparser2.js b/src/lib/markbind/src/patches/htmlparser2.js
index 66e308f..8229ee8 100644
--- a/src/lib/markbind/src/patches/htmlparser2.js
+++ b/src/lib/markbind/src/patches/htmlparser2.js
@@ -150,17 +150,8 @@ Tokenizer.prototype.specialTagNames = [
  * and initialises the _matchingSpecialTags array with objects containing information about the matches.
  */
 Tokenizer.prototype.isFirstCharacterSpecialTagCharacter = function(c) {
-	this._matchingSpecialTags = this.specialTagNames
-		.map((str, index) => ({
-			index,
-			nextTestIndex: 0,
-		}))
-		.filter(indexObj => c.toLowerCase() === this.specialTagNames[indexObj.index][indexObj.nextTestIndex])
-		.map(indexObj => ({
-			index: indexObj.index,
-			nextTestIndex: indexObj.nextTestIndex + 1,
-			hasFinishedMatching: this.specialTagNames[indexObj.index][indexObj.nextTestIndex + 1] === undefined,
-		}));
+	this._potentialSpecialTag = c;
+	this._matchingSpecialTags = this.specialTagNames.filter(tag => tag[0] === c);
 
 	return this._matchingSpecialTags.length > 0;
 };
@@ -171,26 +162,18 @@ Tokenizer.prototype.isFirstCharacterSpecialTagCharacter = function(c) {
  * If one of the previous matches has finished matching, the corresponding matchIndex is set and returned.
  */
 Tokenizer.prototype.processSpecialFunctions = function(c) {
-	let matchIndex;
-
+	let matchedTag;
+	this._potentialSpecialTag += c;
 	this._matchingSpecialTags = this._matchingSpecialTags
-		.filter((indexObj) => {
-			if (indexObj.hasFinishedMatching) {
-				matchIndex = indexObj.index;
-
-				return c === "/" || c === ">" || whitespace(c);
+		.filter((tag) => {
+			if (this._potentialSpecialTag === tag) {
+				matchedTag = tag;
+				return true;
 			}
-
-			return c.toLowerCase() === this.specialTagNames[indexObj.index][indexObj.nextTestIndex];
-		})
-		.map(indexObj => ({
-				index: indexObj.index,
-				nextTestIndex: indexObj.nextTestIndex + 1,
-				hasFinishedMatching: this.specialTagNames[indexObj.index][indexObj.nextTestIndex + 1] === undefined,
-			}));
-
+			return tag.startsWith(this._potentialSpecialTag);
+		});
 	return {
-		matchIndex,
+		matchIndex: matchedTag && this.specialTagNames.indexOf(matchedTag),
 		hasMatching: this._matchingSpecialTags.length > 0,
 	};
 };

ang-zeyu · 2020-02-25T06:34:22Z

Not a complete review, but can i suggest a change to the specialcharacter parsing logic?

Hmm, I went with the same char-by-char approach as htmlparser2 mainly for performance, which is essential for something used throughout markbind.

I definitely see some areas in the two functions which can be made simpler after looking at it again though.
~~Will update shortly~~ Updated, let me know if its better 😁

openorclose · 2020-02-26T04:03:41Z

Not a complete review, but can i suggest a change to the specialcharacter parsing logic?

Hmm, I went with the same char-by-char approach as htmlparser2 mainly for performance, which is essential for something used throughout markbind.

I definitely see some areas in the two functions which can be made simpler after looking at it again though.
~~Will update shortly~~ Updated, let me know if its better grin

Yup this looks easier to understand

crphang

Some minor comments. Looks okay in general

crphang · 2020-02-27T04:16:57Z

src/lib/markbind/src/patches/htmlparser2.js

+ * Changes the Tokenizer state to BEFORE_SPECIAL_END if the token matches one of
+ * the first character of the currently matched special tag.
+ */
+Tokenizer.prototype._stateBeforeCloseingTagName = function(c) {


Minor: closingTagName

It's a patched method from htmlparser2

crphang · 2020-02-27T04:17:59Z

src/lib/markbind/src/patches/htmlparser2.js

+ * the first character of the currently matched special tag.
+ */
+Tokenizer.prototype._stateBeforeCloseingTagName = function(c) {
+	if (whitespace(c));


What do we do if it is whitespace?

This line is from html parser2 as well; Its to allow something like </ tagname> to work

crphang · 2020-02-27T04:37:38Z

src/lib/markbind/src/patches/htmlparser2.js

+ */
+Tokenizer.prototype.processSpecialFunctions = function(c) {
+	const matchingSpecialTags = [];
+	const numMatchingTags = this._matchingSpecialTags.length;


Minor: Could _matchingSpecialTags be renamed to _matchingSpecialTagsIndex? It makes the operation below a little less confusing.

crphang · 2020-02-27T04:40:52Z

src/lib/markbind/src/patches/htmlparser2.js

+	 */
+
+	// processSpecialFunctions, processSpecialClosingTagCharacter
+	HAS_MATCHING              = -1,


Do you think it is possible to follow the previous enumerations for these

Needed negative numbers here as the positive ones are 'taken' by the indexes

crphang

LGTM

ang-zeyu · 2020-02-29T06:33:23Z

Added 'injection' to js-beautify as well ( line 762-764 of site.js ), so the input to special tags appears as is in the browser with no formatting changes ( indentations in particular );
Js-beautify already has a 'blacklist' option in this case, so the changes are only a few lines long

crphang · 2020-03-01T13:04:40Z

@marvinchin could you help with reviewing this? 🙏. Have a few PRs I'm hoping to rebase on this. 🙇

marvinchin

Wow this is some pretty complex stuff! The logic mostly looks good to me, but there are some parts that I was a little confused about, so I've left a couple of questions behind. I'm not too familiar with this part of MarkBind, so it'd be great if you could help fill in the gaps in my understanding so I can give this a more through review.

I think @acjh might also have more context on this, especially with regards to the htmlparser2 patch, so getting his opinion on this would be very valuable.

marvinchin · 2020-03-01T15:30:09Z

docs/userGuide/usingPlugins.md

+You can implement the `getSpecialTags` method to blacklist the content in these special tags from parsing,
+removing such potential conflicts.
+
+- `getSpecialTags(pluginContext)`: Called to get link elements to be added to the head of the page.


Called to get link elements to be added to the head of the page.

Is this description correct? :P

marvinchin · 2020-03-01T15:33:41Z

src/Site.js

+    const tagsToIgnore = new Set();
+
+    Object.values(this.plugins).forEach((plugin) => {
+      if (!plugin.getSpecialTags) return;


I think we prefer to wrap the return in curly braces even though it's only a single line:

if (!plugin.getSpecialTags) { return; }

src/lib/markbind/src/lib/markdown-it-shared/markdown-it-escape-special-tags.js

marvinchin · 2020-03-01T16:04:31Z

src/lib/markbind/src/patches/htmlparser2.js

+	HAS_MATCHED               = -3;
+
+function whitespace(c) {
+	return c === " " || c === "\n" || c === "\t" || c === "\f" || c === "\r";


Should we use a regex whitespace check instead? I'm just a little concerned about doing an explicit check because there are so many different ways to get whitespace :P

It's a method from htmlparser2

marvinchin · 2020-03-01T16:09:13Z

src/lib/markbind/src/patches/htmlparser2.js

+
+	if (this._matchingSpecialTagIndexes.length > 0) {
+		this._nextSpecialTagMatchIndex = 1;
+		return true


Missing ; here

marvinchin · 2020-03-01T16:32:57Z

src/lib/markbind/src/patches/htmlparser2.js

+	}
+
+	if (this._matchingSpecialTagIndexes.length > 0) {
+		this._nextSpecialTagMatchIndex = 1;


Just wondering - why is this set to 1? What exactly does the nextSpecialTagMatchIndex refer to?

It's the next character index to match for the special tags
Here it is 1 since it matched the 0th character

ang-zeyu · 2020-03-01T19:55:38Z

src/lib/markbind/src/lib/markdown-it-shared/markdown-it-escape-special-tags.js

+
+/*
+ Custom patch for the api to escape content in certain special tags
+ Adapted from the default markdown-it html_block rule and replaces it.


Apologies, should have really highlighted this line more.
The specific file is \node_modules\markdown-it\lib\rules_block\html_block

75% of this file ( all of the comments ) is essentially the same as it.
I left all the lines from the original file with the exact formatting so its easier to see what’s patched.
The changes here are essentially just using the collected special tags to form a new regex to replace the first one in the original rule.

The htmlparser2 patch has a lot of changes though, 'thankfully' 😅

marvinchin

Thanks for explaining it - this looks okay to the best of my understanding :P

ang-zeyu · 2020-03-02T16:51:58Z

Thanks for explaining it - this looks okay to the best of my understanding :P

Thanks for looking through this as well!

I've made just a few more method name changes in the htmlparser2 patch:
isFirstCharacterSpecialTagCharacter -> _matchSpecialTagsFirstCharacters
processSpecialFunctions -> _matchSpecialTagsNextCharacters
processSpecialTagClosingCharacter -> _matchNextSpecialTagClosingCharacter

Will update back if there are more changes besides rebasing on master

The parsers MarkBind uses parses content in html tags as html or markdown respectively. This makes it difficult to add custom components that utilize conflicting syntax. Let’s add an interface to ignore content in certain special tags. This interface is also exposed to plugins as the getSpecialTags option.

…nvert-to-code-block * 'master' of https://github.com/MarkBind/markbind: Allow changing parameter properties (MarkBind#1075) Custom timezone for built-in timestamp (MarkBind#1073) Fix reload inconsistency when updating frontmatter (MarkBind#1068) Implement an api to ignore content in certain tags (MarkBind#1047) Enable AppVeyor CI (MarkBind#1040) Add heading and line highlighting to code blocks (MarkBind#1034) Add dividers and fix bug in siteNav (MarkBind#1063) Fixed navbar no longer covers modals (MarkBind#1070) Add copy code-block plugin (MarkBind#1043) Render plugins on dynamic resources (MarkBind#1051) Documentation for Implement no-* attributes for <box> (MarkBind#1042) Migrate to bootstrap-vue popovers (MarkBind#1033) Refactor preprocess and url processing functions (MarkBind#1026) Add pageNav to Using Plugins Page (MarkBind#1062) # Conflicts: # docs/userGuide/syntax/siteNavigationMenus.mbdf

* 'master' of https://github.com/MarkBind/markbind: 2.12.0 Update outdated test files Update vue-strap version to v2.0.1-markbind.37 Fix refactor to processDynamicResources (MarkBind#1092) Implement lazy page building for markbind serve (MarkBind#1038) Add warnings for conflicting/deprecated component attribs (MarkBind#1057) Allow changing parameter properties (MarkBind#1075) Custom timezone for built-in timestamp (MarkBind#1073) Fix reload inconsistency when updating frontmatter (MarkBind#1068) Implement an api to ignore content in certain tags (MarkBind#1047) Enable AppVeyor CI (MarkBind#1040) Add heading and line highlighting to code blocks (MarkBind#1034) Add dividers and fix bug in siteNav (MarkBind#1063) Fixed navbar no longer covers modals (MarkBind#1070) Add copy code-block plugin (MarkBind#1043) Render plugins on dynamic resources (MarkBind#1051) Documentation for Implement no-* attributes for <box> (MarkBind#1042) Migrate to bootstrap-vue popovers (MarkBind#1033) Refactor preprocess and url processing functions (MarkBind#1026) Add pageNav to Using Plugins Page (MarkBind#1062)

The parsers MarkBind uses parses content in html tags as html or markdown respectively. This makes it difficult to add custom components that utilize conflicting syntax. Let’s add an interface to ignore content in certain special tags. This interface is also exposed to plugins as the getSpecialTags option.

ang-zeyu mentioned this pull request Feb 17, 2020

Strong support for diagrams #984

Open

ang-zeyu force-pushed the special-tags-api branch from 8679713 to a043e1c Compare February 21, 2020 12:02

ang-zeyu changed the title ~~[WIP] Implement an api to ignore text in certain tags~~ Implement an api to ignore text in certain tags Feb 21, 2020

ang-zeyu force-pushed the special-tags-api branch from a043e1c to d1e0a66 Compare February 22, 2020 07:49

crphang mentioned this pull request Feb 22, 2020

Escape Nunjucks for special tags #1049

Merged

ang-zeyu force-pushed the special-tags-api branch 2 times, most recently from 0705280 to d124b7f Compare February 24, 2020 12:53

openorclose reviewed Feb 24, 2020

View reviewed changes

ang-zeyu force-pushed the special-tags-api branch from d124b7f to d4aa2cf Compare February 25, 2020 07:15

ang-zeyu force-pushed the special-tags-api branch from d4aa2cf to 0839f2f Compare February 26, 2020 10:24

crphang reviewed Feb 27, 2020

View reviewed changes

ang-zeyu force-pushed the special-tags-api branch from 0839f2f to 1ad18aa Compare February 27, 2020 08:28

crphang approved these changes Feb 27, 2020

View reviewed changes

ang-zeyu force-pushed the special-tags-api branch from 1ad18aa to 0532e54 Compare February 29, 2020 06:28

marvinchin reviewed Mar 1, 2020

View reviewed changes

ang-zeyu force-pushed the special-tags-api branch from 0532e54 to 9db8d9a Compare March 1, 2020 19:47

ang-zeyu commented Mar 1, 2020

View reviewed changes

ang-zeyu force-pushed the special-tags-api branch 2 times, most recently from 7acd5d5 to d74178f Compare March 1, 2020 20:03

marvinchin approved these changes Mar 2, 2020

View reviewed changes

marvinchin added this to the v2.11.1 milestone Mar 2, 2020

ang-zeyu force-pushed the special-tags-api branch from d74178f to 9dc8557 Compare March 2, 2020 16:51

crphang mentioned this pull request Mar 2, 2020

WIP: Adding mermaid integration for support of more diagrams #1079

Closed

5 tasks

ang-zeyu force-pushed the special-tags-api branch from 9dc8557 to b3091ce Compare March 3, 2020 16:20

marvinchin requested a review from yamgent March 4, 2020 04:26

ang-zeyu force-pushed the special-tags-api branch from b3091ce to c0fed92 Compare March 4, 2020 13:19

yamgent approved these changes Mar 7, 2020

View reviewed changes

yamgent added the pr.NewFeature 🆕 Enable users (authors/readers) to do something new label Mar 7, 2020

yamgent merged commit c2b68e8 into MarkBind:master Mar 7, 2020

ang-zeyu mentioned this pull request Mar 7, 2020

Unify markdown-it parser variants #1056

Merged

ang-zeyu mentioned this pull request Apr 11, 2020

Allow variable tags to contain malformed xml #1193

Merged

crphang mentioned this pull request Apr 18, 2020

Allow authors to use render syntax with curly braces {{ content }} #1206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement an api to ignore text in certain tags #1047

Implement an api to ignore text in certain tags #1047

ang-zeyu commented Feb 17, 2020 •

edited

Loading

ang-zeyu commented Feb 17, 2020

crphang commented Feb 17, 2020 •

edited

Loading

ang-zeyu commented Feb 17, 2020 •

edited

Loading

ang-zeyu commented Feb 21, 2020

crphang commented Feb 23, 2020

ang-zeyu commented Feb 23, 2020

openorclose left a comment •

edited

Loading

ang-zeyu commented Feb 25, 2020 •

edited

Loading

openorclose commented Feb 26, 2020

crphang left a comment

crphang Feb 27, 2020

ang-zeyu Feb 27, 2020

crphang Feb 27, 2020

ang-zeyu Feb 27, 2020

crphang Feb 27, 2020

ang-zeyu Feb 27, 2020

crphang Feb 27, 2020

ang-zeyu Feb 27, 2020

crphang left a comment

ang-zeyu commented Feb 29, 2020 •

edited

Loading

crphang commented Mar 1, 2020

marvinchin left a comment •

edited

Loading

marvinchin Mar 1, 2020

marvinchin Mar 1, 2020

marvinchin Mar 1, 2020

ang-zeyu Mar 1, 2020

marvinchin Mar 1, 2020

marvinchin Mar 1, 2020

ang-zeyu Mar 1, 2020

ang-zeyu Mar 1, 2020 •

edited

Loading

marvinchin left a comment

ang-zeyu commented Mar 2, 2020 •

edited

Loading

Implement an api to ignore text in certain tags #1047

Implement an api to ignore text in certain tags #1047

Conversation

ang-zeyu commented Feb 17, 2020 • edited Loading

ang-zeyu commented Feb 17, 2020

crphang commented Feb 17, 2020 • edited Loading

ang-zeyu commented Feb 17, 2020 • edited Loading

ang-zeyu commented Feb 21, 2020

crphang commented Feb 23, 2020

ang-zeyu commented Feb 23, 2020

openorclose left a comment • edited Loading

Choose a reason for hiding this comment

ang-zeyu commented Feb 25, 2020 • edited Loading

openorclose commented Feb 26, 2020

crphang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crphang left a comment

Choose a reason for hiding this comment

ang-zeyu commented Feb 29, 2020 • edited Loading

crphang commented Mar 1, 2020

marvinchin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ang-zeyu Mar 1, 2020 • edited Loading

Choose a reason for hiding this comment

marvinchin left a comment

Choose a reason for hiding this comment

ang-zeyu commented Mar 2, 2020 • edited Loading

ang-zeyu commented Feb 17, 2020 •

edited

Loading

crphang commented Feb 17, 2020 •

edited

Loading

ang-zeyu commented Feb 17, 2020 •

edited

Loading

openorclose left a comment •

edited

Loading

ang-zeyu commented Feb 25, 2020 •

edited

Loading

ang-zeyu commented Feb 29, 2020 •

edited

Loading

marvinchin left a comment •

edited

Loading

ang-zeyu Mar 1, 2020 •

edited

Loading

ang-zeyu commented Mar 2, 2020 •

edited

Loading