Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Ensure that full processor can seek to earlier bookmarks #7649

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f9777aa
Add failing test case
sirreal Oct 25, 2024
5d75bb7
Work on fix
sirreal Oct 25, 2024
541b4f6
wip
sirreal Oct 25, 2024
dc3f1e6
try fixes
sirreal Nov 5, 2024
89aa01b
Fix the issue
sirreal Nov 5, 2024
737bf92
Restore original bookmark matching
sirreal Nov 5, 2024
cb63bbc
Lint: remove empty line
sirreal Nov 5, 2024
744cf01
Add commentary around state resetting in seek
sirreal Nov 6, 2024
6106a56
Use do-while loop to iterate after first check
sirreal Nov 6, 2024
851df38
Fix fragment parser bug that should reset breadcrumbs before seeking
sirreal Nov 6, 2024
9af204c
Assert setting bookmark two
sirreal Nov 6, 2024
660dc85
Remove condition that may not be necessary
sirreal Nov 6, 2024
9a0c017
Set parsing namespace correctly when seeking
sirreal Nov 6, 2024
d181938
Fix fragment state reset for different context nodes
sirreal Nov 6, 2024
d5bf14c
Add and improve comments
sirreal Nov 6, 2024
388bf19
Add and improve HTML processor bookmark tests
sirreal Nov 6, 2024
f7af6e3
Add test seeking from SVG namespace
sirreal Nov 6, 2024
d5a7d5c
Rework with tests for fragment and full processors
sirreal Nov 6, 2024
b95e402
Restore private stack properties to private
sirreal Nov 6, 2024
90eb6e2
Use a temporary bookmark to avoid modifying tag processor internal state
sirreal Nov 6, 2024
75ab9c2
Break at root-node when popping elements on seek
sirreal Nov 6, 2024
05ca2a4
Remove context-node cases from open elements
sirreal Nov 6, 2024
1cde425
Merge branch 'trunk' into html-api/fix-63390-seek
sirreal Nov 6, 2024
c48be70
Add test for root-node regression
sirreal Nov 11, 2024
710c5f7
Fix root-node pop regression in fragment parser seek
sirreal Nov 11, 2024
0efe691
Merge branch 'trunk' into html-api/fix-63390-seek
sirreal Nov 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class WP_HTML_Active_Formatting_Elements {
*
* @var WP_HTML_Token[]
*/
private $stack = array();
public $stack = array();

/**
* Reports if a specific node is in the stack of active formatting elements.
Expand Down
2 changes: 1 addition & 1 deletion src/wp-includes/html-api/class-wp-html-open-elements.php
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ class WP_HTML_Open_Elements {
*
* @var bool
*/
private $has_p_in_button_scope = false;
public $has_p_in_button_scope = false;

/**
* A function that will be called when an item is popped off the stack of open elements.
Expand Down
72 changes: 52 additions & 20 deletions src/wp-includes/html-api/class-wp-html-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -5324,8 +5324,7 @@ public function seek( $bookmark_name ): bool {
*/
if ( 'backward' === $direction ) {
/*
* Instead of clearing the parser state and starting fresh, calling the stack methods
* maintains the proper flags in the parser.
* When moving backward, stateful stacks should be cleared.
*/
foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
if ( 'context-node' === $item->bookmark_name ) {
sirreal marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -5343,32 +5342,65 @@ public function seek( $bookmark_name ): bool {
$this->state->active_formatting_elements->remove_node( $item );
}

parent::seek( 'context-node' );
$this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
$this->state->frameset_ok = true;
$this->element_queue = array();
$this->current_element = null;
/*
* **After** clearing stacks, more processor state can be reset.
* This must be done after clearing the stack because those stacks generate events that
* would appear on a subsequent call to `next_token()`.
*/
$this->state->frameset_ok = true;
$this->state->stack_of_template_insertion_modes = array();
$this->state->head_element = null;
$this->state->form_element = null;
$this->state->current_token = null;
Comment on lines +5348 to +5352
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a reset() method to WP_HTML_Processor_State?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that but ultimately decided against it. This isn't really a full reset, and the reset behaves a bit differently between the fragment/full processors.

To simplify this, I'd probably create a new state instance and copy over the important properties. That may be worth exploring but that felt like a larger refactor.

$this->current_element = null;
$this->element_queue = array();

if ( isset( $this->context_node ) ) {
$this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 );
/*
* The absence of a context node indicates a full parse.
* The presence of a context node indicates a fragment parser.
*/
if ( null === $this->context_node ) {
$this->change_parsing_namespace( 'html' );
$this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_INITIAL;
$this->breadcrumbs = array();

$this->bytes_already_parsed = 0;
$this->parser_state = self::STATE_READY;
$this->next_token();
} else {
$this->breadcrumbs = array();
}
}
$this->change_parsing_namespace(
$this->context_node->integration_node_type
? 'html'
: $this->context_node->namespace
);

if ( 'TEMPLATE' === $this->context_node->node_name ) {
$this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
}

// When moving forwards, reparse the document until reaching the same location as the original bookmark.
if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
return true;
$this->reset_insertion_mode_appropriately();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is inspired by #7348 where a fragment parser is created on arbitrary context nodes and must be initialized properly.

This is very similar, the state must be reset to the appropriate initial state for the context.

This follows the fragment parsing algorithm: https://html.spec.whatwg.org/multipage/parsing.html#html-fragment-parsing-algorithm

$this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 );
parent::seek( $this->context_node->bookmark_name );
}
}

while ( $this->next_token() ) {
/*
* Here, the processor moves forward through the document until it matches the bookmark.
* do-while is used here because the processor is expected to already be stopped on
* a token than may match the bookmarked location.
*/
do {
/*
* The processor will stop on virtual tokens, but bookmarks may not be set on them.
* They should not be matched when seeking a bookmark, skip them.
*/
if ( $this->is_virtual() ) {
continue;
}
if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
while ( isset( $this->current_element ) && WP_HTML_Stack_Event::POP === $this->current_element->operation ) {
$this->current_element = array_shift( $this->element_queue );
sirreal marked this conversation as resolved.
Show resolved Hide resolved
}
return true;
}
}
} while ( $this->next_token() );

return false;
}
Expand Down
2 changes: 1 addition & 1 deletion src/wp-includes/html-api/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,7 @@ class WP_HTML_Tag_Processor {
* @since 6.2.0
* @var int
*/
private $bytes_already_parsed = 0;
protected $bytes_already_parsed = 0;

/**
* Byte offset in input document where current token starts.
Expand Down
34 changes: 34 additions & 0 deletions tests/phpunit/tests/html-api/wpHtmlProcessor-bookmark.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<?php
/**
* Unit tests covering WP_HTML_Processor bookmark functionality.
*
* @package WordPress
* @subpackage HTML-API
*/

/**
* @group html-api
*
* @coversDefaultClass WP_HTML_Processor
*/
class Tests_HtmlApi_WpHtmlProcessor_Bookmark extends WP_UnitTestCase {
/**
* Ensures that bookmarks can be set and seeked to for the full processor.
*
* @ticket 62290
*/
public function test_full_processor_seek() {
$bookmark_name = 'the-bookmark';
$processor = WP_HTML_Processor::create_full_parser( '<html><body><div>' );
$this->assertTrue( $processor->next_tag( 'BODY' ) );
$this->assertTrue( $processor->set_bookmark( $bookmark_name ), 'Failed to set bookmark.' );
$this->assertTrue( $processor->has_bookmark( $bookmark_name ), 'Failed has_bookmark check.' );

// Move past the bookmark so it must scan backwards.
$this->assertTrue( $processor->next_tag( 'DIV' ) );

// Confirm the bookmark works.
$this->assertTrue( $processor->seek( $bookmark_name ), 'Failed to seek to bookmark.' );
$this->assertSame( 'BODY', $processor->get_tag() );
}
}
2 changes: 1 addition & 1 deletion tests/phpunit/tests/html-api/wpHtmlProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ public function test_clear_to_navigate_after_seeking() {

// Create a bookmark inside of that stack.
if ( null !== $processor->get_attribute( 'two' ) ) {
$processor->set_bookmark( 'two' );
$this->assertTrue( $processor->set_bookmark( 'two' ) );
break;
}
}
Expand Down
Loading