Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Parser: Add tokens for inner blocks in inner HTML #11309

Closed
wants to merge 1 commit into from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Oct 31, 2018

DO NOT MERGE

Motivated by #8760
Necessary for #10463, #10108

Abstract

Add tokens in innerHTML of parsed post_content indicating where each of a block's innerBlocks came from. e.g. innerHTML: "Before <!-- {0} --> Middle <!-- {1} --> After"

Status

  • the code editor for posts is showing the tokens currently which means that we have to update the full parsing stack to make this PR ready to merge

Summary

Inner blocks, or nested blocks, or blocks-within-blocks, can exist in Gutenberg posts. They are serialized in post_content in-place as normal blocks which exist in between another block's comment delimiters.

<!-- wp:outerBlock -->
Check out my
<!-- wp:voidInnerBlock /-->
and my other
<!-- wp:innerBlock -->
with its own content.
<!-- /wp:innerBlock -->
<!-- /wp:outerBlock -->

The way this gets parsed leaves us in a quandary: we cannot reconstruct the original post_content after parsing because we lose the origin location information for each inner block since they are only passed as an array of inner blocks.

{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n"
}

At this point we have parsed the blocks and prepared them for attaching into the JavaScript block code that interprets them but we have lost our reverse transformation.

In this PR I'd like to introduce a new mechanism which should only marginally break existing functionality but which will enable us to go back and forth isomorphically between the post_content and first stage of parsing. If we can tear apart a Gutenberg post and reassemble then it will let us to structurally-informed processing of the posts without needing to be aware of all the block JavaScript.

The proposed mechanism is a form of tombstone or token inserted into innerHTML where the innerBlocks were found.

{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n<!-- {0} -->\nand my other\n<!-- {1} -->\n"
}

The token is a normal HTML comment of trivial structure which (if accidentally not removed before rendering) should not display in the rendered page, which should be trivial to substitute with a RegEx-based replacement, and which should remain straightforward for plugin authors and translators to deal with.

These tokens won't exist in post_content and they will be stripped before rendering the page. They exist only to restore isomorphism to the parser/printer pair.

@dmsnell dmsnell added the [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f label Oct 31, 2018
@dmsnell dmsnell requested review from mcsf, pento and aduth October 31, 2018 14:49
@dmsnell dmsnell changed the title Parser: Add tokens for inner blocks in inner HTML WIP: Parser: Add tokens for inner blocks in inner HTML Oct 31, 2018
@georgestephanis
Copy link
Contributor

Incidentally, if accidental collision with content is a concern, we could use a little used emoji internally to designate our own stuff -- even potentially a utf-8 code point that we can identify isn't in the content being generated, that we then swap back after it's rendered.

https://twitter.com/leastUsedEmoji/status/1057612046261514240

🔠

@dmsnell
Copy link
Member Author

dmsnell commented Nov 2, 2018

Closing in favor of #11334

@dmsnell dmsnell closed this Nov 2, 2018
@dmsnell dmsnell deleted the parser/add-inner-block-tokens branch November 2, 2018 17:05
dmsnell added a commit that referenced this pull request Nov 7, 2018
Attempt three at including positional information from the parse to enable isomorphic reconstruction of the source `post_content` after parsing.

See alternate attempts: #11082, #11309
Motivated by: #7247, #8760, Automattic/jetpack#10256
Enables: #10463, #10108

## Abstract

Add new `innerContent` property to each block in parser output indicating where in the innerHTML each innerBlock was found.

## Status

 - will update fixtures after design review indicates this is the desired approach
 - all parsers passing new tests for fragment behavior

## Summary

Inner blocks, or nested blocks, or blocks-within-blocks, can exist in Gutenberg posts. They are serialized in `post_content` in place as normal blocks which exist in between another block's comment delimiters.

```html
<!-- wp:outerBlock -->
Check out my
<!-- wp:voidInnerBlock /-->
and my other
<!-- wp:innerBlock -->
with its own content.
<!-- /wp:innerBlock -->
<!-- /wp:outerBlock -->
```

The way this gets parsed leaves us in a quandary: we cannot reconstruct the original `post_content` after parsing because we lose the origin location information for each inner block since they are only passed as an array of inner blocks.

```json
{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n"
}
```

At this point we have parsed the blocks and prepared them for attaching into the JavaScript block code that interprets them but we have lost our reverse transformation.

In this PR I'd like to introduce a new mechanism which shouldn't break existing functionality but which will enable us to go back and forth isomorphically between the `post_content` and first stage of parsing. If we can tear apart a Gutenberg post and reassemble then it will let us to structurally-informed processing of the posts without needing to be aware of all the block JavaScript.

The proposed mechanism is a new property as a **list of HTML fragments with `null` values interspersed between those fragments where the blocks were found**.

```json
{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n",
	"innerContent": [ "\nCheck out my\n", null, "\n and my other\n", null, "\n" ],
}
```

Doing this allows us to replace those `null` values with their associated block (sequentially) from `innerBlocks`.

## Questions

 - Why not use a string token instead of an array?
    - See #11309. The fundamental problem with the token is that it could be valid content input from a person and so there's a probability that we would fail to split the content accurately.

 - Why add the `null` instead of leaving basic array splits like `[ 'before', 'after' ]`?
    - By inspection we can see that without an explicit marker we don't know if the block came before or after or between array elements. We could add empty strings `''` and say that blocks exist only _between_ array elements but the parser code would have to be more complicated to make sure we appropriately add those empty strings. The empty strings are a bit odd anyway.

 - Why add a new property?
    - Code already depends on `innerHTML` and `innerBlocks`; I don't want to break any existing behaviors and adding is less risky than changing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants