Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Ensure that full processor can seek to earlier bookmarks #7649

Closed
wants to merge 26 commits into from

Conversation

sirreal
Copy link
Member

@sirreal sirreal commented Oct 25, 2024

#62290 describes an issue where the full processor cannot seek to earlier bookmarks.

When the processor seeks to an earlier place, it returns the the beginning of the document and proceeds forward until it reaches the appropriate location. This requires resetting internal state so that the processor can correctly proceed from the beginning of the document.

The seeking reset logic was not adapted to account for the full processor. This change updates the seek logic to account for the full and fragment parsers as well as other state that has been introduced in the interim and should be reset.

Trac ticket: https://core.trac.wordpress.org/ticket/62290


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@sirreal sirreal changed the title html api/fix 63390 seek html api/fix 62290 seek Oct 25, 2024
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@sirreal sirreal force-pushed the html-api/fix-63390-seek branch from fb385b5 to dc3f1e6 Compare November 5, 2024 18:23
@sirreal sirreal changed the title html api/fix 62290 seek HTML API: Ensure that full processor can seek to earlier bookmarks Nov 6, 2024
@sirreal
Copy link
Member Author

sirreal commented Nov 6, 2024

There are a lot of parallels with #7348 where the state should be reset appropriately according to the context node.

Comment on lines 5371 to 5381
$this->change_parsing_namespace(
$this->context_node->integration_node_type
? 'html'
: $this->context_node->namespace
);

if ( 'TEMPLATE' === $this->context_node->node_name ) {
$this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
}

// When moving forwards, reparse the document until reaching the same location as the original bookmark.
if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
return true;
$this->reset_insertion_mode_appropriately();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is inspired by #7348 where a fragment parser is created on arbitrary context nodes and must be initialized properly.

This is very similar, the state must be reset to the appropriate initial state for the context.

This follows the fragment parsing algorithm: https://html.spec.whatwg.org/multipage/parsing.html#html-fragment-parsing-algorithm

@sirreal sirreal force-pushed the html-api/fix-63390-seek branch from b1a6c00 to b95e402 Compare November 6, 2024 12:45
@sirreal sirreal marked this pull request as ready for review November 6, 2024 13:01
Copy link

github-actions bot commented Nov 6, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@sirreal
Copy link
Member Author

sirreal commented Nov 6, 2024

This is ready for review @gziolo @westonruter @dmsnell.

The context node is not pushed to the stack of open elements,
this code was either doing nothing or incorrect.
The context node is not pushed to the stack of open elements, and
therefore does not need special handling.
@sirreal sirreal requested a review from dmsnell November 6, 2024 19:37
sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Nov 8, 2024
Squashed commit of the following:

commit 1cde425
Merge: 05ca2a4 8ad5281
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 20:40:05 2024 +0100

    Merge branch 'trunk' into html-api/fix-63390-seek

commit 05ca2a4
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 20:29:36 2024 +0100

    Remove context-node cases from open elements

    The context node is not pushed to the stack of open elements, and
    therefore does not need special handling.

commit 75ab9c2
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 20:27:57 2024 +0100

    Break at root-node when popping elements on seek

    The context node is not pushed to the stack of open elements,
    this code was either doing nothing or incorrect.

commit 90eb6e2
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:53:15 2024 +0100

    Use a temporary bookmark to avoid modifying tag processor internal state

commit b95e402
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:45:13 2024 +0100

    Restore private stack properties to private

commit d5a7d5c
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:40:18 2024 +0100

    Rework with tests for fragment and full processors

    Use a dataprovider to get a factory function for processors

commit f7af6e3
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:19:58 2024 +0100

    Add test seeking from SVG namespace

commit 388bf19
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:05:34 2024 +0100

    Add and improve HTML processor bookmark tests

commit d5bf14c
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 12:17:47 2024 +0100

    Add and improve comments

commit d181938
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 12:13:48 2024 +0100

    Fix fragment state reset for different context nodes

commit 9a0c017
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 12:05:43 2024 +0100

    Set parsing namespace correctly when seeking

commit 660dc85
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:49:21 2024 +0100

    Remove condition that may not be necessary

    This seems related to virutal tokens and it likely covered by the virtual token condition

commit 9af204c
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:47:56 2024 +0100

    Assert setting bookmark two

    This is consisten with the assertion on bookmark one

commit 851df38
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:47:17 2024 +0100

    Fix fragment parser bug that should reset breadcrumbs before seeking

commit 6106a56
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:46:49 2024 +0100

    Use do-while loop to iterate after first check

commit 744cf01
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:42:42 2024 +0100

    Add commentary around state resetting in seek

commit cb63bbc
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Tue Nov 5 19:51:29 2024 +0100

    Lint: remove empty line

commit 737bf92
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Tue Nov 5 19:49:01 2024 +0100

    Restore original bookmark matching

commit 89aa01b
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Tue Nov 5 19:47:16 2024 +0100

    Fix the issue

commit dc3f1e6
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Tue Nov 5 19:06:23 2024 +0100

    try fixes

commit 541b4f6
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Fri Oct 25 23:19:19 2024 +0200

    wip

commit 5d75bb7
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Fri Oct 25 23:15:51 2024 +0200

    Work on fix

commit f9777aa
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Fri Oct 25 22:51:44 2024 +0200

    Add failing test case
@sirreal
Copy link
Member Author

sirreal commented Nov 8, 2024

@mi5t4n you were active on https://core.trac.wordpress.org/ticket/62290, if you could provide testing for this change it would help move things along 🙏

sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Nov 8, 2024
Squashed commit of the following:

commit 1cde425
Merge: 05ca2a4 8ad5281
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 20:40:05 2024 +0100

    Merge branch 'trunk' into html-api/fix-63390-seek

commit 05ca2a4
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 20:29:36 2024 +0100

    Remove context-node cases from open elements

    The context node is not pushed to the stack of open elements, and
    therefore does not need special handling.

commit 75ab9c2
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 20:27:57 2024 +0100

    Break at root-node when popping elements on seek

    The context node is not pushed to the stack of open elements,
    this code was either doing nothing or incorrect.

commit 90eb6e2
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:53:15 2024 +0100

    Use a temporary bookmark to avoid modifying tag processor internal state

commit b95e402
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:45:13 2024 +0100

    Restore private stack properties to private

commit d5a7d5c
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:40:18 2024 +0100

    Rework with tests for fragment and full processors

    Use a dataprovider to get a factory function for processors

commit f7af6e3
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:19:58 2024 +0100

    Add test seeking from SVG namespace

commit 388bf19
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 13:05:34 2024 +0100

    Add and improve HTML processor bookmark tests

commit d5bf14c
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 12:17:47 2024 +0100

    Add and improve comments

commit d181938
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 12:13:48 2024 +0100

    Fix fragment state reset for different context nodes

commit 9a0c017
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 12:05:43 2024 +0100

    Set parsing namespace correctly when seeking

commit 660dc85
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:49:21 2024 +0100

    Remove condition that may not be necessary

    This seems related to virutal tokens and it likely covered by the virtual token condition

commit 9af204c
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:47:56 2024 +0100

    Assert setting bookmark two

    This is consisten with the assertion on bookmark one

commit 851df38
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:47:17 2024 +0100

    Fix fragment parser bug that should reset breadcrumbs before seeking

commit 6106a56
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:46:49 2024 +0100

    Use do-while loop to iterate after first check

commit 744cf01
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Wed Nov 6 11:42:42 2024 +0100

    Add commentary around state resetting in seek

commit cb63bbc
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Tue Nov 5 19:51:29 2024 +0100

    Lint: remove empty line

commit 737bf92
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Tue Nov 5 19:49:01 2024 +0100

    Restore original bookmark matching

commit 89aa01b
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Tue Nov 5 19:47:16 2024 +0100

    Fix the issue

commit dc3f1e6
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Tue Nov 5 19:06:23 2024 +0100

    try fixes

commit 541b4f6
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Fri Oct 25 23:19:19 2024 +0200

    wip

commit 5d75bb7
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Fri Oct 25 23:15:51 2024 +0200

    Work on fix

commit f9777aa
Author: Jon Surrell <sirreal@users.noreply.github.com>
Date:   Fri Oct 25 22:51:44 2024 +0200

    Add failing test case
@sirreal
Copy link
Member Author

sirreal commented Nov 11, 2024

Testing in #7742 discovered an issue present on trunk where the fragment parser root-node may not be present on the stack of open elements in a fragment parser.

I've pushed a test and a fix for that to this PR.

Comment on lines +5348 to +5352
$this->state->frameset_ok = true;
$this->state->stack_of_template_insertion_modes = array();
$this->state->head_element = null;
$this->state->form_element = null;
$this->state->current_token = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a reset() method to WP_HTML_Processor_State?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that but ultimately decided against it. This isn't really a full reset, and the reset behaves a bit differently between the fragment/full processors.

To simplify this, I'd probably create a new state instance and copy over the important properties. That may be worth exploring but that felt like a larger refactor.

Copy link
Contributor

@ockham ockham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me! Thank you for the helpful inline comments and unit test coverage 👍

I'm going to commit this later today (unless someone else beats me to it).

@ockham
Copy link
Contributor

ockham commented Nov 12, 2024

Committed to Core trunk in https://core.trac.wordpress.org/changeset/59391.

@ockham ockham closed this Nov 12, 2024
@sirreal sirreal deleted the html-api/fix-63390-seek branch November 12, 2024 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants