From 0dce7bb561a68823c9543bca2470701b7ebeed6c Mon Sep 17 00:00:00 2001 From: Simon Pieters Date: Tue, 29 Sep 2020 16:33:55 +0200 Subject: [PATCH 1/6] Define speculative HTML parsing Fixes #5624. --- source | 244 +++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 222 insertions(+), 22 deletions(-) diff --git a/source b/source index 0ca31690e1d..25cc9da2083 100644 --- a/source +++ b/source @@ -110134,6 +110134,11 @@ dictionary StorageEventInit : EventInit { particular intended parent, the UA must run the following steps:

    +
  1. If the active speculative HTML parser is not null, then return the result of + creating a speculative mock element + given given namespace, the tag name of the given token, and the attributes of the + given token.

  2. +
  3. Let document be intended parent's node document.

  4. Let local name be the tag name of the token.

  5. @@ -110867,20 +110872,27 @@ document.body.appendChild(text);

    Acknowledge the token's self-closing flag, if it is set.

    -

    If the element has a charset attribute, and getting an encoding from - its value results in an encoding, and the - confidence is currently tentative, then - change the encoding to the resulting encoding.

    +

    If the active speculative HTML parser is null, then:

    + +
      +
    1. If the element has a charset attribute, and getting an encoding from + its value results in an encoding, and the + confidence is currently tentative, + then change the encoding to the resulting encoding.

    2. + +
    3. Otherwise, if the element has an http-equiv + attribute whose value is an ASCII case-insensitive match for the string "Content-Type", and the element has a content attribute, and applying the algorithm for + extracting a character encoding from a meta element to that attribute's + value returns an encoding, and the + confidence is currently tentative, + then change the encoding to the extracted encoding.

    4. +
    -

    Otherwise, if the element has an http-equiv - attribute whose value is an ASCII case-insensitive match for the string "Content-Type", and the element has a content attribute, and applying the algorithm for - extracting a character encoding from a meta element to that attribute's - value returns an encoding, and the - confidence is currently tentative, then - change the encoding to the extracted encoding.

    +

    The speculative HTML parser doesn't speculatively apply character + encoding declarations in order to reduce implementation complexity.

    A start tag whose tag name is "title"
    @@ -112362,8 +112374,8 @@ document.body.appendChild(text);
    An end tag whose tag name is "script"
    -

    If the JavaScript execution context stack is empty, perform a microtask - checkpoint.

    +

    If the active speculative HTML parser is null and the JavaScript execution + context stack is empty, then perform a microtask checkpoint.

    Let script be the current node (which will be a script element).

    @@ -112378,10 +112390,11 @@ document.body.appendChild(text);

    Increment the parser's script nesting level by one.

    -

    Prepare the script. This might - cause some script to execute, which might cause new characters - to be inserted into the tokenizer, and might cause the tokenizer to output more tokens, - resulting in a reentrant invocation of the parser.

    +

    If the active speculative HTML parser is null, then prepare the script. This might cause some script to execute, which + might cause new characters to be inserted into the + tokenizer, and might cause the tokenizer to output more tokens, resulting in a reentrant invocation of the parser.

    Decrement the parser's script nesting level by one. If the parser's script nesting level is zero, then set the parser pause flag to false.

    @@ -112417,6 +112430,9 @@ document.body.appendChild(text);
  6. Let the script be the pending parsing-blocking script. There is no longer a pending parsing-blocking script.

  7. +
  8. Start the speculative HTML parser for this instance of the HTML + parser.

  9. +
  10. Block the tokenizer for this instance of the HTML parser, such that the event loop will not run tasks that invoke the Document.

  11. +
  12. Stop the speculative HTML parser for this instance of the HTML + parser.

  13. +
  14. Unblock the tokenizer for this instance of the HTML parser, such that tasks that invoke the tokenizer can again be run.

  15. @@ -113914,9 +113933,9 @@ document.body.appendChild(text);

    Increment the parser's script nesting level by one. Set the parser pause flag to true.

    -

    Process the - SVG script element according to the SVG rules, if the user agent - supports SVG.

    +

    If the active speculative HTML parser is null and the user agent supports SVG, + then Process the + SVG script element according to the SVG rules.

    Even if this causes new characters to be inserted into the tokenizer, the parser will not be executed reentrantly, since the @@ -113974,6 +113993,9 @@ document.body.appendChild(text);

      +
    1. If the active speculative HTML parser is not null, then stop the + speculative HTML parser and return.

    2. +
    3. Set the insertion point to undefined.

    4. Update the current document readiness to "

      Throw away any pending content in the input stream, and discard any future content that would have been added to it.

    5. +
    6. Stop the speculative HTML parser for this HTML parser.

    7. +
    8. Update the current document readiness to "interactive".

    9. @@ -114123,6 +114147,179 @@ document.body.appendChild(text); +
      + +

      Speculative HTML parsing

      + +

      User agents may implement an optimization, as described in this section, to speculatively fetch + resources that are declared in the HTML markup while the HTML parser is waiting for a + pending parsing-blocking script to be fetched and executed. While this optimization + is not defined in precise detail, there are some rules to consider for interoperability.

      + +

      Each HTML parser can have an active speculative HTML parser. It + is initially null.

      + +

      The speculative HTML parser must act like the normal HTML parser (e.g., the + tree builder rules apply), with some exceptions:

      + +
        +
      • +

        The state of the normal HTML parser and the document itself must not be affected.

        + +

        For example, the next input character or the stack of open + elements for the normal HTML parser is not affected by the speculative HTML + parser.

        +
      • + +
      • +

        Bytes pushed into the HTML parser's input byte stream must also be pushed into + the speculative HTML parser's input byte stream. Bytes read from the streams must + be independent.

        +
      • + +
      • +

        The result of the speculative parsing is primarily a series of speculative + fetches. Which kinds of resources to speculatively fetch is + implementation-defined, but user agents must not speculatively fetch resources that + would not be fetched with the normal HTML parser, under the assumption that the script that is + blocking the HTML parser does nothing.

        + +

        It is possible that the same markup is seen multiple times from the + speculative HTML parser and then the normal HTML parser. It is expected that + duplicated fetches will be prevented by caching rules, which are not yet fully specified.

        +
      • +
      + +

      The speculative fetches must follow these rules:

      + +

      Should some of these things be applied to the document "for real", even + though they are found speculatively?

      + +
        +
      • +

        If the speculative HTML parser encounters one of the following elements, then + act as if that element is processed for the purpose of its effect of speculative fetches for + resources after the element.

        + +
          +
        • A base element.
        • + +
        • A meta element whose http-equiv + attribute is in the Content + security policy state.
        • + +
        • A meta element whose name attribute is an + ASCII case-insensitive match for "referrer".
        • + +
        • A meta element whose name attribute is an + ASCII case-insensitive match for "viewport". (This can + affect whether a media query list matches the environment.)
        • +
        +
      • +
      + +

      To start the speculative HTML parser for an instance of an HTML parser + parser:

      + +
        +
      1. Optionally, return. (This allows user agents to opt out of speculative HTML + parsing.)

      2. + +
      3. +

        If parser's active speculative HTML parser is not null, then + stop the speculative HTML parser for parser.

        + +

        This can happen when document.write() + writes another parser-blocking script. For simplicity, this specification always restarts + speculative parsing, but user agents can implement a more efficient strategy, so long as the end + result is equivalent.

        +
      4. + +
      5. Let speculativeParser be a new speculative HTML parser, with the + same state as parser.

      6. + +
      7. Let speculativeDoc be a new isomorphic representation of parser's + Document, where all elements are instead speculative mock elements. Let speculativeParser parse into + speculativeDoc.

      8. + +
      9. Set parser's active speculative HTML parser to + speculativeParser.

      10. + +
      11. In parallel, run speculativeParser until it is stopped or until it + reaches the end of its input stream.

      12. +
      + + +

      To stop the speculative HTML parser for an instance of an HTML parser + parser:

      + +
        +
      1. Let speculativeParser be parser's active speculative HTML + parser.

      2. + +
      3. If speculativeParser is null, then return.

      4. + +
      5. Throw away any pending content in speculativeParser's input + stream, and discard any future content that would have been added to it.

      6. + +
      7. Set parser's active speculative HTML parser to null.

      8. +
      + +

      The speculative HTML parser will create speculative mock elements instead of normal elements. DOM + operations that the tree builder normally does on elements are expected to work appropriately on + speculative mock elements.

      + +

      A speculative mock element is a struct with the following items:

      + +
        +
      • A string namespace, corresponding + to an element's namespace.

      • + +
      • A string local name, + corresponding to an element's local + name.

      • + +
      • A list attribute list, + corresponding to an element's attribute list.

      • + +
      • A list children, corresponding to + an element's children.

      • +
      + +

      To create a speculative mock element given a namespace, + tagName, and attributes:

      + +
        +
      1. Let element be a new speculative mock element.

      2. + +
      3. Set element's namespace to + namespace.

      4. + +
      5. Set element's local name to + tagName.

      6. + +
      7. Set element's attribute list + to attributes.

      8. + +
      9. Set element's children to a new + empty list.

      10. + +
      11. Return element.

      12. +
      + +

      When the tree builder says to insert an element into a template element's + template contents, if that is a speculative mock element, instead do + nothing. URLs found speculatively inside template elements might themselves be + templates, and must not be speculatively fetched.

      + +
      + +

      Coercing an HTML DOM into an infoset

      @@ -125288,6 +125485,9 @@ INSERT INTERFACES HERE
      [CSSCOLORADJUST]
      CSS Color Adjustment Module, E. Etemad, R. Atanassov, R. Lillesveen, T. Atkins. W3C.
      +
      [CSSDEVICEADAPT]
      +
      CSS Device Adaption, F. Rivoal, M. Rakow. W3C.
      +
      [CSSDISPLAY]
      CSS Display, T. Atkins, E. Etemad. W3C.
      From 1fcf85ff391dfb973bd4dcc4a9cc48d69197b2db Mon Sep 17 00:00:00 2001 From: Simon Pieters Date: Wed, 18 Aug 2021 16:19:34 +0200 Subject: [PATCH 2/6] Avoid speculatively fetching the same URL multiple times --- source | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/source b/source index 25cc9da2083..6ed048d4052 100644 --- a/source +++ b/source @@ -114198,8 +114198,8 @@ document.body.appendChild(text);
      • If the speculative HTML parser encounters one of the following elements, then - act as if that element is processed for the purpose of its effect of speculative fetches for - resources after the element.

        + act as if that element is processed for the purpose of its effect of subsequent speculative + fetches.

        • A base element.
        • @@ -114218,8 +114218,16 @@ document.body.appendChild(text); spec=CSSDEVICEADAPT>
      • + +
      • When a speculative fetch is created, if its URL is already in the list of + speculative fetch URLs, then do nothing for this speculative fetch. Otherwise, fetch the + URL as if the element was processed normally, and add the URL to the list of speculative + fetch URLs.

      +

      Each Document has a list of speculative fetch URLs, which is a + list of URLs, initially empty.

      +

      To start the speculative HTML parser for an instance of an HTML parser parser:

      From 023e5020ef86607c73d3ab7469707ca68ba64aad Mon Sep 17 00:00:00 2001 From: Simon Pieters Date: Wed, 18 Aug 2021 16:36:31 +0200 Subject: [PATCH 3/6] Connect creation of speculative mock element to speculative fetch --- source | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/source b/source index 6ed048d4052..5aab29ab044 100644 --- a/source +++ b/source @@ -114178,8 +114178,8 @@ document.body.appendChild(text);
    10. -

      The result of the speculative parsing is primarily a series of speculative - fetches. Which kinds of resources to speculatively fetch is +

      The result of the speculative parsing is primarily a series of speculative fetches. Which kinds of resources to speculatively fetch is implementation-defined, but user agents must not speculatively fetch resources that would not be fetched with the normal HTML parser, under the assumption that the script that is blocking the HTML parser does nothing.

      @@ -114190,7 +114190,8 @@ document.body.appendChild(text);
    11. -

      The speculative fetches must follow these rules:

      +

      A speculative fetch for a speculative mock element element + must follow these rules:

      Should some of these things be applied to the document "for real", even though they are found speculatively?

      @@ -114219,10 +114220,11 @@ document.body.appendChild(text); -
    12. When a speculative fetch is created, if its URL is already in the list of - speculative fetch URLs, then do nothing for this speculative fetch. Otherwise, fetch the - URL as if the element was processed normally, and add the URL to the list of speculative - fetch URLs.

    13. +
    14. Let url be the URL that element would fetch if it was + processed normally. If there is no such URL or if it is the empty string, then do + nothing. Otherwise, if url is already in the list of speculative fetch + URLs, then do nothing. Otherwise, fetch url as if the element was processed + normally, and add url to the list of speculative fetch URLs.

    15. Each Document has a list of speculative fetch URLs, which is a @@ -114317,6 +114319,8 @@ document.body.appendChild(text);

    16. Set element's children to a new empty list.

    17. +
    18. Optionally, perform a speculative fetch for element.

    19. +
    20. Return element.

    From 775d2bc93430855ff8eb8d58f622d68cb6a0ee19 Mon Sep 17 00:00:00 2001 From: Simon Pieters Date: Wed, 18 Aug 2021 16:50:56 +0200 Subject: [PATCH 4/6] Allow speculative fetches for normal parsing --- source | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/source b/source index 5aab29ab044..df0169fdea4 100644 --- a/source +++ b/source @@ -110139,6 +110139,16 @@ dictionary StorageEventInit : EventInit { given given namespace, the tag name of the given token, and the attributes of the given token.

    +
  16. +

    Otherwise, optionally create a speculative mock element given given + namespace, the tag name of the given token, and the attributes of the given token.

    + +

    The result is not used. This step allows for a speculative fetch to + be initiated from non-speculative parsing. The fetch is still speculative at this point, + because, for example, by the time the element is inserted, intended parent might + have been removed from the document.

    +
  17. +
  18. Let document be intended parent's node document.

  19. Let local name be the tag name of the token.

  20. From 1be217dc0ab426e553a0976b402823da772e7106 Mon Sep 17 00:00:00 2001 From: Simon Pieters Date: Tue, 31 Aug 2021 13:58:41 +0200 Subject: [PATCH 5/6] Fix statement of fact: speculative fetches from normal parsing is allowed --- source | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/source b/source index df0169fdea4..d3eaf3a07a0 100644 --- a/source +++ b/source @@ -114163,8 +114163,10 @@ document.body.appendChild(text);

    User agents may implement an optimization, as described in this section, to speculatively fetch resources that are declared in the HTML markup while the HTML parser is waiting for a - pending parsing-blocking script to be fetched and executed. While this optimization - is not defined in precise detail, there are some rules to consider for interoperability.

    + pending parsing-blocking script to be fetched and executed, or during normal parsing, + at the time an element is created for a token. + While this optimization is not defined in precise detail, there are some rules to consider for + nteroperability.

    Each HTML parser can have an active speculative HTML parser. It is initially null.

    From 3535735891b51346a43780dea64d1fa83e6aefe2 Mon Sep 17 00:00:00 2001 From: Simon Pieters Date: Sun, 12 Sep 2021 21:49:09 +0200 Subject: [PATCH 6/6] Address mfreed7's comments --- source | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/source b/source index d3eaf3a07a0..bc3cbd551ba 100644 --- a/source +++ b/source @@ -114166,7 +114166,7 @@ document.body.appendChild(text); pending parsing-blocking script to be fetched and executed, or during normal parsing, at the time an element is created for a token. While this optimization is not defined in precise detail, there are some rules to consider for - nteroperability.

    + interoperability.

    Each HTML parser can have an active speculative HTML parser. It is initially null.

    @@ -114246,8 +114246,11 @@ document.body.appendChild(text); parser:

      -
    1. Optionally, return. (This allows user agents to opt out of speculative HTML - parsing.)

    2. +
    3. +

      Optionally, return.

      + +

      This step allows user agents to opt out of speculative HTML parsing.

      +
    4. If parser's active speculative HTML parser is not null, then