Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The processing model for text files seems to mandate quirks mode #5939

Closed
mfreed7 opened this issue Sep 23, 2020 · 8 comments · Fixed by #6745
Closed

The processing model for text files seems to mandate quirks mode #5939

mfreed7 opened this issue Sep 23, 2020 · 8 comments · Fixed by #6745
Labels
agenda+ To be discussed at a triage meeting topic: parser

Comments

@mfreed7
Copy link
Contributor

mfreed7 commented Sep 23, 2020

It seems that the Page load processing model for text files mandates quirks mode presentation of the text document. That section says:

Let document be the result of creating and initializing a Document object given "html", type, and navigationParams.

Create an HTML parser and associate it with the document. Act as if the tokenizer had emitted a start tag token with the tag name "pre" followed by a single U+000A LINE FEED (LF) character, and switch the HTML parser's tokenizer to the PLAINTEXT state. Each task that the networking task source places on the task queue while fetching runs must then fill the parser's input byte stream with the fetched bytes and cause the HTML parser to perform the appropriate processing of the input stream.

If I'm reading that correctly, it says that the parser should receive <pre>, a line feed, and then go into PLAINTEXT state to parse the content of the text file. Based on the "initial" insertion mode spec, since this isn't an iframe, the document will be set to quirks mode, due to lack of a DOCTYPE.

I doubt this is intentional. Would it make sense to either:
a) explicitly specify it to be no-quirks mode, or
b) explicitly allow UAs to decide on quirks vs no-quirks mode

Chromium is working on removing built-in quirks mode documents, such as the view-source document. In doing so, this spec issue was pointed out (by @tkent-google, here).

Currently, Chromium, WebKit, and Gecko all produce a quirks-mode document here. But Chromium would like to change that. There seems to be little compat risk to making this change, but perhaps I'm missing something.

@domenic
Copy link
Member

domenic commented Sep 23, 2020

I agree this should not be very risky, and generally seems like a good idea. I hope other browsers think similarly.

I don't think we should leave this up to UAs; although it's not a big interop risk, it's an unnecessary one.

Note also that the corresponding sections for media and plugins use no-quirks, since instead of using the parser, they create elements directly. So for consistency with those sections it'd be ideal if text documents were also no-quirks. (I don't know what browsers actually do, though... maybe they use quirks for everything?)


As an aside: it's not obvious to me whether

Act as if the tokenizer had emitted a start tag token with the tag name "pre" followed by a single U+000A LINE FEED (LF) character, and switch the HTML parser's tokenizer to the PLAINTEXT state

will hit the "initial" insertion mode, or bypass it and go straight into the PLAINTEXT state. Maybe it is obvious to people who are more familiar with the parser. But in whatever state we end up, it might be worth adding a note about the intention.

@annevk
Copy link
Member

annevk commented Sep 24, 2020

I think this is a duplicate of #3113 though note that it reaches a different conclusion of the end result here. It's also not clear to me this is worth changing given that I doubt we can ever change initial about:blank and I would expect that to be more prominent.

@mfreed7
Copy link
Contributor Author

mfreed7 commented Sep 24, 2020

I agree this should not be very risky, and generally seems like a good idea. I hope other browsers think similarly.

I don't think we should leave this up to UAs; although it's not a big interop risk, it's an unnecessary one.

Thanks - I agree on both counts.

Note also that the corresponding sections for media and plugins use no-quirks, since instead of using the parser, they create elements directly. So for consistency with those sections it'd be ideal if text documents were also no-quirks. (I don't know what browsers actually do, though... maybe they use quirks for everything?)

I didn't check other browsers, but Chromium has used quirks mode for both media and plugins. I'm trying to get that changed right now. Along with image preview documents.

As an aside: it's not obvious to me whether

Act as if the tokenizer had emitted a start tag token with the tag name "pre" followed by a single U+000A LINE FEED (LF) character, and switch the HTML parser's tokenizer to the PLAINTEXT state

will hit the "initial" insertion mode, or bypass it and go straight into the PLAINTEXT state. Maybe it is obvious to people who are more familiar with the parser. But in whatever state we end up, it might be worth adding a note about the intention.

Given the comment below from @annevk, it does seem like this needs clarification, at least.

I think this is a duplicate of #3113 though note that it reaches a different conclusion of the end result here. It's also not clear to me this is worth changing given that I doubt we can ever change initial about:blank and I would expect that to be more prominent.

Thanks for the pointer to #3113 - I didn't find that one when I looked. Interesting that the "different conclusion" is that the current spec points to text file documents being in no-quirks mode. So it would seem that we should at least clarify the spec, if not change it.

Your point about about:blank is a good one. However, I think that spec is separate from the text file spec, if I am looking at the right place. The create a new browsing context section says:

  1. Let document be a new Document, marked as an HTML document in quirks mode, whose content type is "text/html", origin is origin, active sandboxing flag set is sandboxFlags, permissions policy is permissionsPolicy, cross-origin opener policy is coop, and which is ready for post-load tasks.

So I would think the proposed change here, to text documents, wouldn't affect that. Right?

blueboxd pushed a commit to blueboxd/chromium-legacy that referenced this issue Sep 25, 2020
Quirks mode layout is slower, so change to no-quirks.

Note that this goes against the spec [1], which mandates quirks
mode for text documents. An issue [2] has been raised to change
that, but in the meantime, this seems like a very low (compat)
risk change. For now, however, the WPT test will not be changed,
and a broken expectation will be committed.

[1] https://html.spec.whatwg.org/multipage/browsing-the-web.html#read-text
[2] whatwg/html#5939

Bug: 1131185
Change-Id: If7ac8cbe8b4ec277f2cb288d120698b9bacf773c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2425109
Commit-Queue: Mason Freed <masonfreed@chromium.org>
Reviewed-by: Kent Tamura <tkent@chromium.org>
Auto-Submit: Mason Freed <masonfreed@chromium.org>
Cr-Commit-Position: refs/heads/master@{#810357}
@annevk
Copy link
Member

annevk commented Sep 25, 2020

Yeah, it wouldn't affect that. I'm a little hesitant to change such long-standing behavior however. Quirks mode is fairly established, the couple quirks it creates are standardized in https://quirks.spec.whatwg.org/. What's the upside of changing this?

@mfreed7
Copy link
Contributor Author

mfreed7 commented Sep 28, 2020

Yeah, it wouldn't affect that. I'm a little hesitant to change such long-standing behavior however. Quirks mode is fairly established, the couple quirks it creates are standardized in https://quirks.spec.whatwg.org/. What's the upside of changing this?

For Chromium, at least, the implementation of quirks mode can be slower than standards mode, and in many cases switching to no-quirks makes rendering faster. So I'm going through the various documents and attempting to make them no-quirks where possible. The text file change was one of those, and was one of the few (other than about:blank) where quirks-mode seemed to be according to spec.

Aside from that, having some standard documents be specified in "quirks" mode just seems odd, given that quirks-mode is a backwards-compatibility layer and is discouraged for new features:

Where possible, limit quirks to a fixed set of legacy features so they don’t propagate into new features.

This was also alluded to above by @domenic - other similar documents (e.g. media and plugins) are specified to be in no-quirks mode.

Do you foresee compat issues from this change? If so, any pointers for where to look for such issues?

+@rniwa, any thoughts?

@zcorpan
Copy link
Member

zcorpan commented Feb 11, 2021

For text/plain docs, at least, switching to standards mode has a rendering difference because of pre's top margin no longer being collapsed.

Something that could break is if the parent page or opener page mutates the DOM to add styles or other stuff, and expect quirks mode. But I would guess that this is extremely rare.

@zcorpan
Copy link
Member

zcorpan commented Feb 11, 2021

So per https://bugs.chromium.org/p/chromium/issues/detail?id=1131185 looks like this was changed in Chromium Sep 25, 2020. @mfreed7 has there been any fallout?

@mfreed7
Copy link
Contributor Author

mfreed7 commented Apr 2, 2021

So per https://bugs.chromium.org/p/chromium/issues/detail?id=1131185 looks like this was changed in Chromium Sep 25, 2020. @mfreed7 has there been any fallout?

So as mentioned, this change went into Chromium as of 87.0.4275.0. We're two stable releases past that, and so far I've seen zero fallout. I'd say this is safe change to make.

@domenic domenic added the agenda+ To be discussed at a triage meeting label May 26, 2021
domenic pushed a commit that referenced this issue Jun 24, 2021
mfreed7 added a commit to mfreed7/html that referenced this issue Jun 3, 2022
mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this issue Oct 14, 2022
Quirks mode layout is slower, so change to no-quirks.

Note that this goes against the spec [1], which mandates quirks
mode for text documents. An issue [2] has been raised to change
that, but in the meantime, this seems like a very low (compat)
risk change. For now, however, the WPT test will not be changed,
and a broken expectation will be committed.

[1] https://html.spec.whatwg.org/multipage/browsing-the-web.html#read-text
[2] whatwg/html#5939

Bug: 1131185
Change-Id: If7ac8cbe8b4ec277f2cb288d120698b9bacf773c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2425109
Commit-Queue: Mason Freed <masonfreed@chromium.org>
Reviewed-by: Kent Tamura <tkent@chromium.org>
Auto-Submit: Mason Freed <masonfreed@chromium.org>
Cr-Commit-Position: refs/heads/master@{#810357}
GitOrigin-RevId: 61de461c7a4f6395bf433d56d1ca7b2c0226e058
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda+ To be discussed at a triage meeting topic: parser
Development

Successfully merging a pull request may close this issue.

4 participants