Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an api to ignore text in certain tags #1047

Merged
merged 1 commit into from
Mar 7, 2020

Conversation

ang-zeyu
Copy link
Contributor

@ang-zeyu ang-zeyu commented Feb 17, 2020

What is the purpose of this pull request? (put "X" next to an item, remove the rest)

• [x] New feature

Groundwork for #984

What is the rationale for this request?
To provide a convenient interface, for plugins / core markbind code to blacklist certain tags to ignore in parsers.

This should open the doors to custom components that can have all sorts of strange syntaxes inside them, without using markdown code blocks. e.g. puml #968

What changes did you make? (Give an overview)

  • Patch an existing rule in markdown-it, exposing an interface that allows injecting tags to ignore before registering the patch. The patch allows these additional tags to be treated the same way <script/pre/style> tags are treated in markdown-it currently.
  • Patch many of htmlparser2 state handlers and added some state variables to the tokenizer, exposing a similar interface.

Provide some example code that this change will affect:

function injectIgnoreTags(tagsToIgnore) {
	Tokenizer.prototype.specialTagNames = Tokenizer.prototype.specialTagNames.concat(...tagsToIgnore)
}

Is there anything you'd like reviewers to focus on?
#984 (comment) - degree of commonmark compliance we should follow

Testing instructions:

  • npm run test should work
  • rebuilding the 2103 site should show no diffs, save for timestamps and log output

Proposed commit message: (wrap lines at 72 characters)
Implement an api to ignore content in special tags

The parsers MarkBind uses parses content in html tags as html
or markdown respectively.
This makes it difficult to add custom components that utilize
conflicting syntax.

Let’s add an interface to ignore content in certain special tags.
This interface is also exposed to plugins as the getSpecialTags option.

@ang-zeyu
Copy link
Contributor Author

@crphang
The interface is exposed through the two inject methods, called under collectPluginSpecialTags once during Site generation.
Plugins implement the getSpecialTags: () => [ 'tagnames' , '...', ...] method to provide these tag names.
Let me know if this works for what you have in mind, thanks!

@crphang
Copy link
Contributor

crphang commented Feb 17, 2020

Let me branch off this and test for inline puml

@ang-zeyu
Copy link
Contributor Author

ang-zeyu commented Feb 17, 2020

Let me branch off this and test for inline puml.

👍 , the content in the inline puml should be exposed as a text node to cheerio

I wouldn't mind changing the getSpecialTags implementation if needed for your work on widgets as well, but the two inject methods ( markdown-it and htmlparser2 ) should essentially be widget-plugin independent. Going forward I'll maintain the format of these two methods while fixing more bugs / tweaking the implementation if needed.

@ang-zeyu
Copy link
Contributor Author

Let me branch off this and test for inline puml

Ready for review,
Not much changed from the first commit apart from adding a functional test and cleaning some things up.

@ang-zeyu ang-zeyu changed the title [WIP] Implement an api to ignore text in certain tags Implement an api to ignore text in certain tags Feb 21, 2020
@crphang
Copy link
Contributor

crphang commented Feb 23, 2020

Sorry for the delay in reviewing, will take a deeper look into the PR shortly.

For now, I've branched off this PR https://github.com/crphang/markbind/tree/inline-puml-plugin and tested inline puml, works really well 👍

@ang-zeyu
Copy link
Contributor Author

Sorry for the delay in reviewing, will take a deeper look into the PR shortly.

For now, I've branched off this PR https://github.com/crphang/markbind/tree/inline-puml-plugin and tested inline puml, works really well 👍

No worries; Look forward to the finished version 😄

@ang-zeyu ang-zeyu force-pushed the special-tags-api branch 2 times, most recently from 0705280 to d124b7f Compare February 24, 2020 12:53
Copy link
Contributor

@openorclose openorclose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Not a complete review, but can i suggest a change to the specialcharacter parsing logic?

https://github.com/openorclose/markbind/blob/special-tags-api/src/lib/markbind/src/patches/htmlparser2.js

diff --git a/src/lib/markbind/src/patches/htmlparser2.js b/src/lib/markbind/src/patches/htmlparser2.js
index 66e308f..8229ee8 100644
--- a/src/lib/markbind/src/patches/htmlparser2.js
+++ b/src/lib/markbind/src/patches/htmlparser2.js
@@ -150,17 +150,8 @@ Tokenizer.prototype.specialTagNames = [
  * and initialises the _matchingSpecialTags array with objects containing information about the matches.
  */
 Tokenizer.prototype.isFirstCharacterSpecialTagCharacter = function(c) {
-	this._matchingSpecialTags = this.specialTagNames
-		.map((str, index) => ({
-			index,
-			nextTestIndex: 0,
-		}))
-		.filter(indexObj => c.toLowerCase() === this.specialTagNames[indexObj.index][indexObj.nextTestIndex])
-		.map(indexObj => ({
-			index: indexObj.index,
-			nextTestIndex: indexObj.nextTestIndex + 1,
-			hasFinishedMatching: this.specialTagNames[indexObj.index][indexObj.nextTestIndex + 1] === undefined,
-		}));
+	this._potentialSpecialTag = c;
+	this._matchingSpecialTags = this.specialTagNames.filter(tag => tag[0] === c);
 
 	return this._matchingSpecialTags.length > 0;
 };
@@ -171,26 +162,18 @@ Tokenizer.prototype.isFirstCharacterSpecialTagCharacter = function(c) {
  * If one of the previous matches has finished matching, the corresponding matchIndex is set and returned.
  */
 Tokenizer.prototype.processSpecialFunctions = function(c) {
-	let matchIndex;
-
+	let matchedTag;
+	this._potentialSpecialTag += c;
 	this._matchingSpecialTags = this._matchingSpecialTags
-		.filter((indexObj) => {
-			if (indexObj.hasFinishedMatching) {
-				matchIndex = indexObj.index;
-
-				return c === "/" || c === ">" || whitespace(c);
+		.filter((tag) => {
+			if (this._potentialSpecialTag === tag) {
+				matchedTag = tag;
+				return true;
 			}
-
-			return c.toLowerCase() === this.specialTagNames[indexObj.index][indexObj.nextTestIndex];
-		})
-		.map(indexObj => ({
-				index: indexObj.index,
-				nextTestIndex: indexObj.nextTestIndex + 1,
-				hasFinishedMatching: this.specialTagNames[indexObj.index][indexObj.nextTestIndex + 1] === undefined,
-			}));
-
+			return tag.startsWith(this._potentialSpecialTag);
+		});
 	return {
-		matchIndex,
+		matchIndex: matchedTag && this.specialTagNames.indexOf(matchedTag),
 		hasMatching: this._matchingSpecialTags.length > 0,
 	};
 };

@ang-zeyu
Copy link
Contributor Author

ang-zeyu commented Feb 25, 2020

Not a complete review, but can i suggest a change to the specialcharacter parsing logic?

Hmm, I went with the same char-by-char approach as htmlparser2 mainly for performance, which is essential for something used throughout markbind.

I definitely see some areas in the two functions which can be made simpler after looking at it again though.
Will update shortly Updated, let me know if its better 😁

@openorclose
Copy link
Contributor

Not a complete review, but can i suggest a change to the specialcharacter parsing logic?

Hmm, I went with the same char-by-char approach as htmlparser2 mainly for performance, which is essential for something used throughout markbind.

I definitely see some areas in the two functions which can be made simpler after looking at it again though.
Will update shortly Updated, let me know if its better grin

Yup this looks easier to understand

Copy link
Contributor

@crphang crphang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Looks okay in general

* Changes the Tokenizer state to BEFORE_SPECIAL_END if the token matches one of
* the first character of the currently matched special tag.
*/
Tokenizer.prototype._stateBeforeCloseingTagName = function(c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: closingTagName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a patched method from htmlparser2

* the first character of the currently matched special tag.
*/
Tokenizer.prototype._stateBeforeCloseingTagName = function(c) {
if (whitespace(c));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we do if it is whitespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is from html parser2 as well; Its to allow something like </ tagname> to work

*/
Tokenizer.prototype.processSpecialFunctions = function(c) {
const matchingSpecialTags = [];
const numMatchingTags = this._matchingSpecialTags.length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Could _matchingSpecialTags be renamed to _matchingSpecialTagsIndex? It makes the operation below a little less confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

*/

// processSpecialFunctions, processSpecialClosingTagCharacter
HAS_MATCHING = -1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is possible to follow the previous enumerations for these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed negative numbers here as the positive ones are 'taken' by the indexes

Copy link
Contributor

@crphang crphang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ang-zeyu
Copy link
Contributor Author

ang-zeyu commented Feb 29, 2020

Added 'injection' to js-beautify as well ( line 762-764 of site.js ), so the input to special tags appears as is in the browser with no formatting changes ( indentations in particular );
Js-beautify already has a 'blacklist' option in this case, so the changes are only a few lines long

@crphang
Copy link
Contributor

crphang commented Mar 1, 2020

@marvinchin could you help with reviewing this? 🙏. Have a few PRs I'm hoping to rebase on this. 🙇

Copy link
Contributor

@marvinchin marvinchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this is some pretty complex stuff! The logic mostly looks good to me, but there are some parts that I was a little confused about, so I've left a couple of questions behind. I'm not too familiar with this part of MarkBind, so it'd be great if you could help fill in the gaps in my understanding so I can give this a more through review.

I think @acjh might also have more context on this, especially with regards to the htmlparser2 patch, so getting his opinion on this would be very valuable.

You can implement the `getSpecialTags` method to blacklist the content in these special tags from parsing,
removing such potential conflicts.

- `getSpecialTags(pluginContext)`: Called to get link elements to be added to the head of the page.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Called to get link elements to be added to the head of the page.

Is this description correct? :P

src/Site.js Outdated
const tagsToIgnore = new Set();

Object.values(this.plugins).forEach((plugin) => {
if (!plugin.getSpecialTags) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we prefer to wrap the return in curly braces even though it's only a single line:

if (!plugin.getSpecialTags) {
  return; 
}

HAS_MATCHED = -3;

function whitespace(c) {
return c === " " || c === "\n" || c === "\t" || c === "\f" || c === "\r";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a regex whitespace check instead? I'm just a little concerned about doing an explicit check because there are so many different ways to get whitespace :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a method from htmlparser2


if (this._matchingSpecialTagIndexes.length > 0) {
this._nextSpecialTagMatchIndex = 1;
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ; here

}

if (this._matchingSpecialTagIndexes.length > 0) {
this._nextSpecialTagMatchIndex = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering - why is this set to 1? What exactly does the nextSpecialTagMatchIndex refer to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the next character index to match for the special tags
Here it is 1 since it matched the 0th character


/*
Custom patch for the api to escape content in certain special tags
Adapted from the default markdown-it html_block rule and replaces it.
Copy link
Contributor Author

@ang-zeyu ang-zeyu Mar 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, should have really highlighted this line more.
The specific file is \node_modules\markdown-it\lib\rules_block\html_block

75% of this file ( all of the comments ) is essentially the same as it.
I left all the lines from the original file with the exact formatting so its easier to see what’s patched.
The changes here are essentially just using the collected special tags to form a new regex to replace the first one in the original rule.

The htmlparser2 patch has a lot of changes though, 'thankfully' 😅

@ang-zeyu ang-zeyu force-pushed the special-tags-api branch 2 times, most recently from 7acd5d5 to d74178f Compare March 1, 2020 20:03
Copy link
Contributor

@marvinchin marvinchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining it - this looks okay to the best of my understanding :P

@marvinchin marvinchin added this to the v2.11.1 milestone Mar 2, 2020
@ang-zeyu
Copy link
Contributor Author

ang-zeyu commented Mar 2, 2020

Thanks for explaining it - this looks okay to the best of my understanding :P

Thanks for looking through this as well!

I've made just a few more method name changes in the htmlparser2 patch:
isFirstCharacterSpecialTagCharacter -> _matchSpecialTagsFirstCharacters
processSpecialFunctions -> _matchSpecialTagsNextCharacters
processSpecialTagClosingCharacter -> _matchNextSpecialTagClosingCharacter


Will update back if there are more changes besides rebasing on master

The parsers MarkBind uses parses content in html tags as html
or markdown respectively.
This makes it difficult to add custom components that utilize
conflicting syntax.

Let’s add an interface to ignore content in certain special tags.
This interface is also exposed to plugins as the getSpecialTags option.
@yamgent yamgent added the pr.NewFeature 🆕 Enable users (authors/readers) to do something new label Mar 7, 2020
@yamgent yamgent merged commit c2b68e8 into MarkBind:master Mar 7, 2020
Tejas2805 added a commit to Tejas2805/markbind that referenced this pull request Mar 7, 2020
…nvert-to-code-block

* 'master' of https://github.com/MarkBind/markbind:
  Allow changing parameter properties (MarkBind#1075)
  Custom timezone for built-in timestamp (MarkBind#1073)
  Fix reload inconsistency when updating frontmatter (MarkBind#1068)
  Implement an api to ignore content in certain tags (MarkBind#1047)
  Enable AppVeyor CI (MarkBind#1040)
  Add heading and line highlighting to code blocks (MarkBind#1034)
  Add dividers and fix bug in siteNav (MarkBind#1063)
  Fixed navbar no longer covers modals (MarkBind#1070)
  Add copy code-block plugin (MarkBind#1043)
  Render plugins on dynamic resources (MarkBind#1051)
  Documentation for Implement no-* attributes for <box>  (MarkBind#1042)
  Migrate to bootstrap-vue popovers (MarkBind#1033)
  Refactor preprocess and url processing functions (MarkBind#1026)
  Add pageNav to Using Plugins Page (MarkBind#1062)

# Conflicts:
#	docs/userGuide/syntax/siteNavigationMenus.mbdf
Tejas2805 added a commit to Tejas2805/markbind that referenced this pull request Mar 9, 2020
* 'master' of https://github.com/MarkBind/markbind:
  2.12.0
  Update outdated test files
  Update vue-strap version to v2.0.1-markbind.37
  Fix refactor to processDynamicResources (MarkBind#1092)
  Implement lazy page building for markbind serve (MarkBind#1038)
  Add warnings for conflicting/deprecated component attribs (MarkBind#1057)
  Allow changing parameter properties (MarkBind#1075)
  Custom timezone for built-in timestamp (MarkBind#1073)
  Fix reload inconsistency when updating frontmatter (MarkBind#1068)
  Implement an api to ignore content in certain tags (MarkBind#1047)
  Enable AppVeyor CI (MarkBind#1040)
  Add heading and line highlighting to code blocks (MarkBind#1034)
  Add dividers and fix bug in siteNav (MarkBind#1063)
  Fixed navbar no longer covers modals (MarkBind#1070)
  Add copy code-block plugin (MarkBind#1043)
  Render plugins on dynamic resources (MarkBind#1051)
  Documentation for Implement no-* attributes for <box>  (MarkBind#1042)
  Migrate to bootstrap-vue popovers (MarkBind#1033)
  Refactor preprocess and url processing functions (MarkBind#1026)
  Add pageNav to Using Plugins Page (MarkBind#1062)
marvinchin pushed a commit that referenced this pull request Apr 10, 2020
The parsers MarkBind uses parses content in html tags as html
or markdown respectively.
This makes it difficult to add custom components that utilize
conflicting syntax.

Let’s add an interface to ignore content in certain special tags.
This interface is also exposed to plugins as the getSpecialTags option.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr.NewFeature 🆕 Enable users (authors/readers) to do something new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants