-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strip whitespaces in render_block_core_cover #40859
strip whitespaces in render_block_core_cover #40859
Conversation
Follows on from [WordPress#39658]( WordPress#39658) Remove whitespaces from html tags generated by `render_block_core_covermarkup` before injecting the featured image
👋 Thanks for your first Pull Request and for helping build the future of Gutenberg and WordPress, @navigatrum! In case you missed it, we'd love to have you join us in our Slack community, where we hold regularly weekly meetings open to anyone to coordinate with each other. If you want to learn more about WordPress development in general, check out the Core Handbook full of helpful information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this bug! :) I assumed the markup in the block's save function is fixed. I'd make the regexp more specific as ideally in the future we'd want to not need it at all so not encouraging reliance on it would be good.
@dmsnell this is spot on about our conversation around hand editing block markup and the assumptions we can make around that. It didn't break the editor in this case but it did break the expected idea that the final output is decided on save.
Are you talking about the rendered page @draganescu? We shouldn't expect that the serialized HTML produced by Gutenberg and stored in We want to remember that the stored HTML is a serialized version of the document, but that Gutenberg posts themselves are not inherently HTML and the HTML they generate is both a stored serialization of those posts and a mechanism to provide more graceful fallback for preserving content when the supporting systems (like WordPress) are unavailable. |
Hi all, is this still something we are trying to get into 6.0, or can we push it to 6.0.1? Just let me know and I will update the project boards accordingly. |
Given that we are hours from 6.0 RC3, I am going to bump this to the 6.0.1 Project Board. |
At first when I saw this I thought there was a very specific reason we needed the whitespace to be gone, but after looking a bit deeper it looks like we really just need to ensure that we can inject the image into the template. Did you consider doing real parsing of the HTML in order to inject the image instead of relying on this considerably-more-frail text-parsing? I'm not sure what exactly your needs are, but I find Parsing HTMLfunction render_block_core_cover( $attributes, $content ) {
if ( 'image' !== $attributes['backgroundType'] || false === $attributes['useFeaturedImage'] ) {
return $content;
}
if ( ! ( $attributes['hasParallax'] || $attributes['isRepeated'] ) ) {
$attr = array(
'class' => 'wp-block-cover__image-background',
'data-object-fit' => 'cover',
);
if ( isset( $attributes['focalPoint'] ) ) {
$object_position = round( $attributes['focalPoint']['x'] * 100 ) . '%' . ' ' . round( $attributes['focalPoint']['y'] * 100 ) . '%';
$attr['data-object-position'] = $object_position;
$attr['style'] = 'object-position: ' . $object_position;
}
$image = get_the_post_thumbnail( null, 'post-thumbnail', $attr );
if ( ! $image ) {
return $content;
}
$dom = new DOMDocument();
$dom->loadHTML( $content );
$inner_container = null;
foreach ( $dom->getElementsByTagName('div') as $node ) {
if ( 1 === preg_match( '/(^| )wp-block-cover__inner-container( |$)/', $node->getAttribute('class') ) ) {
$inner_container = $node;
break;
}
}
if ( ! $inner_container ) {
return $content;
}
$image_node = $dom->createDocumentFragment();
$image_node->appendXML( $image );
$inner_container->insertBefore( $image_node );
$content = '';
foreach ( $dom->documentElement->firstChild->childNodes as $node ) {
$content .= $dom->saveHTML( $node ) . PHP_EOL;
}
} else {
if ( in_the_loop() ) {
update_post_thumbnail_cache();
}
$current_featured_image = get_the_post_thumbnail_url();
if ( ! $current_featured_image ) {
return $content;
}
$dom = new DOMDocument();
$dom->loadHTML( $content );
$container = null;
foreach ( $dom->getElementsByTagName( 'div' ) as $node ) {
if ( 1 === preg_match( '/(^| )wp-block-cover( |$)/', $node->getAttribute( 'class' ) ) ) {
$container = $node;
break;
}
}
if ( ! $container ) {
return $content;
}
$background_style = sprintf( 'background-image:url(%s)', $current_featured_image );
$container->setAttribute('style', $container->getAttribute('style') . ' ' . $background_style );
$content = '';
foreach ( $dom->documentElement->firstChild->childNodes as $node ) {
$content .= $dom->saveHTML( $node ) . PHP_EOL;
}
}
return $content;
} ^^^ that's a little convoluted, but a few helper functions can make it read much smoother Parsing HTML with helpersfunction render_block_core_cover( $attributes, $content ) {
if ( 'image' !== $attributes['backgroundType'] || false === $attributes['useFeaturedImage'] ) {
return $content;
}
function html2dom( $html ) {
$dom = new DOMDocument();
$dom->loadHTML( $html );
return $dom;
}
function dom2html( $dom ) {
$html = '';
foreach ( $dom->documentElement->firstChild->childNodes as $node ) {
$html .= $dom->saveHTML( $node ) . PHP_EOL;
}
return $html;
}
function html2node( $dom, $html ) {
$fragment = $dom->createDocumentFragment();
$fragment->appendXML( $html );
return $fragment;
}
function getTagWithClassname( $dom, $tag, $class ) {
foreach ( $dom->getElementsByTagName( $tag ) as $element ) {
if ( 1 === preg_match( '/(^| )' . preg_quote( $class, '/' ) . '( |$)/', $element->getAttribute('class') ) ) {
return $element;
}
}
return null;
}
if ( ! ( $attributes['hasParallax'] || $attributes['isRepeated'] ) ) {
$attr = array(
'class' => 'wp-block-cover__image-background',
'data-object-fit' => 'cover',
);
if ( isset( $attributes['focalPoint'] ) ) {
$object_position = round( $attributes['focalPoint']['x'] * 100 ) . '%' . ' ' . round( $attributes['focalPoint']['y'] * 100 ) . '%';
$attr['data-object-position'] = $object_position;
$attr['style'] = 'object-position: ' . $object_position;
}
$image = get_the_post_thumbnail( null, 'post-thumbnail', $attr );
if ( ! $image ) {
return $content;
}
$dom = html2dom( $content );
$inner_container = getTagWithClassname( $dom, 'div', 'wp-block-cover__inner-container' );
if ( ! $inner_container ) {
return $content;
}
$inner_container->insertBefore( html2node( $dom, $image ) );
return dom2html( $dom );
}
if ( in_the_loop() ) {
update_post_thumbnail_cache();
}
$current_featured_image = get_the_post_thumbnail_url();
if ( ! $current_featured_image ) {
return $content;
}
$dom = html2dom( $content );
$container = getTagWithClassname( $dom, 'div', 'wp-block-cover' );
if ( ! $container ) {
return $content;
}
$background_style = sprintf( 'background-image:url(%s)', $current_featured_image );
$container->setAttribute( 'style', $container->getAttribute('style') . ' ' . $background_style );
return dom2html( $dom );
} Looks like this is now outdated based on a refactor of the function as well; we need to update the patch to account for that. I've made my suggestion here based on the more recent version of the file in The most relevant bit is this code here as it's the part that disambiguates the replacement operation (at least in my head it does a better job indicating what should be happening) and its resilient to whitespace, escaping, and formatting issues. As a bonus, it gives us the classnames as complete strings we can find with a basic search when we need to find and update this code: $dom = html2dom( $content );
$inner_container = getTagWithClassname( $dom, 'div', 'wp-block-cover__inner-container' );
if ( ! $inner_container ) {
return $content;
}
$inner_container->insertBefore( html2node( $dom, $image ) );
return dom2html( $dom ); |
@dmsnell I had considered the idea of Dom parsing, but I discarted it because because:
|
@dmsnell I think @navigatrum only applied a simple correction for code that was already merged, so their approach is valid in the context of how this already works. While your suggestion to use DOMDocument to build block HTML on render is very good, it is something we as a project have not made use of so far (e.g. From what I see, in general block HTML is generated with string manipulation. While what we do in the cover block, for the featured image case, is "injecting" html into whatever the block is saved as, is a bit different than general rendering, using DOMDocument to build a DOM object would be a completely new way of achieving a similar result. The problem this particular PR tries to resolve is that manually editing block html in theme files may insert spurious whitespace. It is a general block problem that we happened to stumble upon here. cc @hellofromtonya - should we try to move away from HTML string manipulation and favour DOM building? Are the old considerations that @navigatrum mentioned still a thing? |
@draganescu No, I don't recommend building with PHP's DOM API. Why? PHP's DOM API:
I'd love to see a DOM API land in Core. But, as noted above, when it can't work/run, then string fallbacks are still needed. Until the above are resolved, the use of PHP's DOM API is too limited. |
:sigh: earlier today I spent a while writing a large response and now I'm not seeing the comment I'm sure I posted. I'm going to try and write most of it again, but I'm sorry I won't be able to do it as eloquently or completely as I did before. @hellofromtonya do we have any data on how many WordPress hosts have @youknowriad @gziolo have you encountered this before? I've been considering more broad use of @navigatrum @draganescu it would be nice to consider the other suggestions I had, specifically in sharing more context for what we're changing and searching for. I'm trying to put my feet into the shoes of someone tasked with figuring out why certain markup is changing and broken and particularly the lack of anything in there that I can put into a simple string search and find this code is a big obstacle to debugging it. alternatively we can make the RegExp pattern a little more verbose and get a lot more safety in the process. although this looks more complicated at the time of writing, I'm thinking it would be far more resilient in time and more helpful to the future given that it communicates from the code the specific thing it's looking for. Fun fact: in PHP we are free to use other pattern delimiters which means we don't have to escape $inner_container_start = '~<div [^>]*\bclass=("|')((?!\1).)*\bwp-block-cover__inner-container\b((?!\1).)*\1( |>)~';
if ( 1 !== preg_match( $inner_container_start, $content, $matches, PREG_OFFSET_CAPTURE ) {
// handle this, we can't find our div, maybe another plugin changed it
return $content;
}
list( $fullmatch, $offset ) = $matches[0];
$content = substr( $content, 0, $offset ) . $image_tag . substr( $content, $offset ); this too can be split into functions that give better names to what we're looking for if it helps make the searching and replacement clearer. function strpos_tag_with_class( $html, $class ) {
$CLASS = preg_quote( $class, "~" );
$pattern = "~<div [^>]*\bclass=(\"|')((?!\1).)*\b{$CLASS}\b((?!\1).)*\1( |>)~";
if ( 1 !== preg_match( $pattern, $html, $matches, PREG_OFFSET_CAPTURE ) ) {
return false;
}
list( $fullmatch, $offset ) = $matches[0];
return $offset;
}
…
$image_at = strpos_tag_with_class( $content, "wp-block-cover__inner-container" );
if ( false === $image_at ) {
return $content;
}
$before_image = substr( $content, 0, $image_at );
$after_image = substr( $content, $image_at );
return "{$before_image}{$image_tag}{$after_image}"; |
I think regex have been favored in general for performance reasons (but I may wrong), especially since all this logic renders in the frontend and we have a number of similar use-cases. I wouldn't mind DOMDocument usage if it behaves better in terms of performance. |
I'm not sure that info is collected. @dd32 might know.
@dmsnell besides build tooling and tests, can you point me to where Doing a search in the 6.0 released source (in build rather than develop), |
Thinking more about this, I think I need to provide more clarity of where Possible use cases:
*Possibly node selection: fallback would be needed if If there's zero risk of malformed HTML being passed into For these use cases, a Core DOM API might be handy to pull all the library checks, setups (such as turning on and off libxml errors, encoding, etc.), error handling, fallback detection, etc. into one place vs. distributing repeating/redundant code. |
@hellofromtonya Problem is when a block 'becomes' part of a theme template, HTML tends to become malformed. In my case I just copyed the code from the Gutenberg code Editor, and pasted it to my code editor. The auto formatter did the rest. |
@navigatrum just wanted to note that this tiny PR here takes so long not because it's wrong, but because it raises some possibly important conclusions around how we expect the system to be both resilient to hand coding markup and handling it's output consistently. The replacement of spaces solution is good, but should we do that everywhere? Are there other blocks where a simple space in the template markup breaks the system? Hence the duration. Wanted to mention that and to thank you for the patience 🥇 |
@navigatrum I think for this particular PR, we can just make it so that we don't introduce code that will stump us in the future via @dmsnell 's suggestions:
What do y'all think? |
I used |
@youknowriad this is the PHP @hellofromtonya upon further inspection I found that use in an expired |
Late to the party but may I add 2c? :) DOMDoculemt sucks for parsing HTML. Enough said. It's great for generating (simple) XML. It works pretty well for parsing well-formed XML. But @dmsnell I think we had a similar conversation few years ago, tried using DOMDocument then too, and failed. At the same time using Regex for "partially parsing" HTML sucks too. Core already has some examples there. If a tag has to be injected and that cannot be done in the browser, best would be to append (or prepend) the tag to the rest of the HTML in the block. That would probably look more like:
The rest (positioning, etc.) can be handled with CSS. |
Overall I think this is a disturbing indicator and ultimately falls back to the use of string-processing for HTML content. No whitespace changes should break HTML (apart from the fact that a leading newline/space after the opening of a tag inserts a single whitespace on render). If we get to the point where we find that whitespace changes did break our code a far more reliable fix would be to update our code that depends on that whitespace sensitivity. Gutenberg itself is liberal in reformatting HTML so a quick hack here and there to force a specific HTML formatting so that processing stages later down the line don't crack is going to build a mess -quickly-, not to mention the readability issues when finding code like I'd rather see us frame this as "the replacement of spaces solution works in this particular situation but we should try to eliminate the need for such a workaround that we'll soon regret." Realistically this isn't the place or PR to demand that everything properly parse HTML (whether through In other words, the problem isn't the space before the |
Yeah, completely agree. String processing of HTML is best avoided at all costs. It may look simple and bulletproof but always has "strings attached" (pun intended) :) |
@azaozz I missed your comment while I was writing mine.
sounds familiar: image source sets? I was hoping things have changed since then.
Not sure I fully understand what you're saying, but I believe the code snippet you wrote is roughly equivalent to the one I wrote, and the one I wrote I think has the same failure modes, where this HTML appears inside a |
Yep, think so! :)
Yes, it is. Was just trying to point out that it would be best to append (or prepend) an img tag at the end (or the beginning) of the block's HTML, not inject it in the middle. Then it would be a simple string concatenation, won't need regex, etc. Everything else can (most likely) be handled with CSS. |
It seems we can proceed here with a regexp and we have two suggestions from @dmsnell 👍🏻
and the one from @navigatrum :
... which are very similar. I think (1) is hard to follow, (2) and (3) are better, with (2) being even safer with the match on the "class" attribute name. @navigatrum let's tweak the regex one last time and then this will land 👏🏻 You still up for it? |
For the block layout see: https://github.com/WordPress/gutenberg/blob/trunk/packages/block-library/src/cover/edit/index.js#L327-L367 For the HTML tags capturing by class name see: https://github.com/WordPress/gutenberg/blob/51db4bf888e6b18cf9d18266108114d61070f3ad/lib/block-supports/layout.php#L232
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending the tests passing and this working as we expect it to, I approve 👍
Thanks for persevering with this!
@navigatrum there's a PHP linting error on this, are you able to address it? Thanks! |
Wasn't it quotes around regex solved by admziel commit? |
There's another one with a missing |
Sorry I'm still seeing these errors:
|
The failing navigation E2E tests are flaky and unrelated to this PR. Let's ship it! |
Congratulations on your first merged pull request, @navigatrum! We'd like to credit you for your contribution in the post announcing the next WordPress release, but we can't find a WordPress.org profile associated with your GitHub account. When you have a moment, visit the following URL and click "link your GitHub account" under "GitHub Username" to link your accounts: https://profiles.wordpress.org/me/profile/edit/ And if you don't have a WordPress.org account, you can create one on this page: https://login.wordpress.org/register Kudos! |
Thanks:
|
Stopgap solution to remove whitespaces from html tags generated by render_block_core_covermarkup before injecting the featured image. Follows on from [#39658]( #39658). Remove whitespaces from html tags generated by `render_block_core_covermarkup` before injecting the featured image. For the block layout see: https://github.com/WordPress/gutenberg/blob/trunk/packages/block-library/src/cover/edit/index.js#L327-L367 For the HTML tags capturing by class name see: https://github.com/WordPress/gutenberg/blob/51db4bf888e6b18cf9d18266108114d61070f3ad/lib/block-supports/layout.php#L232 Co-authored-by: Adam Zielinski <adam@adamziel.com>
I just cherry-picked this PR to the wp/6.0 branch to get it included in the next release: c92a83a |
What?
Follows on from #39658 , #39658 (comment)
Remove whitespaces from html tags generated by
render_block_core_covermarkup
before injecting the featured image.Why?
The fetaured image isn't injected from a template file beacuse of white spaces between html tags (#39658 (comment))