-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change host parser for non-special schemes #148
Comments
In these cases, what we humans considered the host was actually the path.
When trying the same with "notspecial://" Safari agrees with Chrome and Firefox. I think these are similar cases. |
Given that different browsers preserve the case of notspecial://CamelCase for different reasons, I think we should keep the spec as it is. I think I'm willing to change WebKit to punycode encode nonspecial hosts (changing uppercase ASCII to lowercase) to continue to have a spec that has hosts in nonspecial schemes. Alternatives would be to percent encode nonspecial schemes, which I think is strange and shouldn't be done, or considering asdf://host to have a path of "//host", which shouldn't be done. |
Ah right, non-Safari browsers basically do not parse non-special schemes at all. The standard aligned with Safari in b266a43 as it better matches the intent of URLs and the original RFCs. Closing this then. Thanks! |
https://bugs.webkit.org/show_bug.cgi?id=162660 <rdar://28601706> Reviewed by Sam Weinig. LayoutTests/imported/w3c: * web-platform-tests/XMLHttpRequest/send-network-error-sync-events.sub-expected.txt: * web-platform-tests/html/semantics/embedded-content/media-elements/interfaces/HTMLElement/HTMLTrackElement/src-expected.txt: * web-platform-tests/html/semantics/scripting-1/the-script-element/fetch-src/failure-expected.txt: These tests need more investigation. See https://bugs.webkit.org/show_bug.cgi?id=163127 * web-platform-tests/url/a-element-expected.txt: * web-platform-tests/url/a-element-xhtml-expected.txt: * web-platform-tests/url/url-constructor-expected.txt: * web-platform-tests/url/url-setters-expected.txt: Many more tests pass. Hooray! Source/WebCore: Covered by updates to many LayoutTests. * platform/URLParser.cpp: Make the default value true for URLParser::enabled. This is the most impactful and well-documented one-line change I've ever written. LayoutTests: Many failing tests are now passing. The tests in fast/url look like they are an old test suite, some of which we were failing. We now pass many more of the tests. Those results are updated. Some URLs in the suite are invalid, and we now "fail" those tests. Rather than update the tests, I just changed the expectation to FAIL, which seems to be tolerable in this directory because there were many tests whose result was FAIL. Each such case is explained below. * fast/dom/DOMURL/parsing-expected.txt: * fast/dom/DOMURL/parsing.html: Percent-encoded values in the host are supposed to be decoded according to the spec. %2f decodes to '/' which is an invalid domain character. * fast/dom/DOMURL/set-href-attribute-hash-expected.txt: * fast/dom/DOMURL/set-href-attribute-hash.html: Added a space to the domain (which is an invalid domain character and the others in this test are not according to the spec) in order to continue to test that setting the hash of an invalid URL does not change its href. * fast/dom/DOMURL/set-href-attribute-protocol-expected.txt: * fast/dom/DOMURL/set-href-attribute-protocol.html: * fast/dom/HTMLAnchorElement/set-href-attribute-protocol-expected.txt: * fast/dom/HTMLAnchorElement/set-href-attribute-protocol.html: "http:??bar" now canonicalizes to "http://??bar" instead of adding one slash. * fast/url/file-expected.txt: * fast/url/file-http-base-expected.txt: Updated results. Many tests that were failing are now passing. * fast/url/anchor-expected.txt: Percent-encoding of non-ASCII characters in fragments now matches Firefox. * fast/url/host-expected.txt: Wide characters in the host such as http://%ef%bc%85%ef%bc%90%ef%bc%90.com/ should fail to parse. This matches Chrome and the spec. URLs with an empty host with a port should fail to parse. This matches Chrome, Firefox, and the spec. * fast/url/host-lowercase-per-scheme-expected.txt: According to spec, hosts of non-special URLs should be parsed the same as special URL hosts. Different browsers seem to have the existing behavior for different reasons. See whatwg/url#148 and https://bugs.webkit.org/show_bug.cgi?id=162885 * fast/url/idna2003-expected.txt: * fast/url/invalid-urls-utf8-expected.txt: Host encoding is now done according to the spec. * fast/url/invalid-idn-expected.txt: Neither Chrome, Firefox, nor the spec change invalid hosts to about:blank. * fast/url/ipv4-expected.txt: * fast/url/ipv6-expected.txt: "http://[0:0::0:0:8:]/" should indeed be compressed to "http://[::8]/" This kind of deterministic compression makes it so that two IPv6 addresses that are equal will parse to URLs that are also equal, even if they are written differently. * fast/url/path-expected.txt: * fast/url/relative-expected.txt: * fast/url/relative-win-expected.txt: * fast/url/safari-extension-expected.txt: Proper canonicalization of non-special hosts should be scheme://host/ or scheme:/// if there is no host. safari-extension is not special. Hosts should always be canonicalized to lowercase. * fast/url/segments-expected.txt: * fast/url/segments-from-data-url-expected.txt: The path of "foo://" should be "/" not "//". Extra slashes immediately after scheme:// should be ignored. URLs with no host but a port like "http:@:80/www.apple.com" are now invalid, matching Chrome, Firefox, and the spec. * fast/url/segments-userinfo-vs-host-expected.txt: '@' can be in the user. If it is, it is percent encoded. This matches Chrome and Firefox. "foo://" has a path of "/" not "//" Extra slashes after the scheme such as in "foo://///////" are now ignored according to spec. * fast/url/standard-url-expected.txt: * fast/url/tab-and-newline-stripping-expected.txt: http://[2001:5::042:44::0370:7334]/ is an invalid IPv6 address, so parsing it should fail. It passed with URL::parse because we used to only check that the characters inside the [] were valid ipv6 characters, not that they made any sense or were in any kind of bounds. * fast/url/url-credentials-escaping-expected.txt: Credential encoding is now according to spec. * http/tests/appcache/resources/x-frame-options-prevents-framing-test.html: http:/path1/path2 relative to http://host/path3 now canonicalizes to http://host/path1/path2 instead of http://path1/path2 so this test, which I believe was missing the second slash in error, needs to be fixed. * imported/w3c/web-platform-tests/XMLHttpRequest/send-network-error-sync-events.sub-expected.txt: Having a '}' in the host of a URL used to be invalid and it is now percent-escaped, matching Chrome and the spec. This test still passes on w3c-test.org. We can look into why it is failing locally later. See webkit.org/b/163127 * fast/loader/redirect-to-invalid-url-using-javascript-calls-policy-delegate-expected.txt: * fast/loader/redirect-to-invalid-url-using-meta-refresh-calls-policy-delegate-expected.txt: * fast/loader/window-open-to-invalid-url-calls-policy-delegate-expected.txt: http://HoSt is now being correctly interpreted as the host, and it is being punycode encoded if it's non-ASCII and lowercased if it is. * fast/forms/ValidityState-typeMismatch-url.html: * fast/forms/ValidityState-typeMismatch-url-expected.txt: Spaces in the host are invalid. This matches Firefox and the spec. * http/tests/inspector/network/copy-as-curl.html: '{' and '}' are now percent encoded in the URL path. This matches Firefox, Chrome, and the spec. * fast/loader/location-port.html: * fast/loader/location-port-expected.txt: parsing or setting ports in URLs with no host is no longer supported. This matches Firefox and Chrome. * security/block-test-expected.txt: * platform/mac/security/block-test-expected.txt: out-of-bounds ports now cause parsing failures. * imported/w3c/web-platform-tests/html/semantics/scripting-1/the-script-element/fetch-src/failure-expected.txt: "http://[]/" now fails to parse because it is an invalid IPv6 host. * fast/url/ipv6-expected.txt: IPv4 addresses at the end of IPv6 addresses are now serialized as the equivalent hex value in IPv6 form. This matches Chrome and the spec, and makes it so that equal IPv6 addresses written in different forms are equal. * fast/loader/url-parse-1-expected.txt: Extra or missing slashes and spaces around scheme:// are now handled according to the spec. * http/tests/websocket/tests/hybi/handshake-ok-with-http-version-beyond-1_1-expected.txt: The non-standard apple logo character is represented here by its non-standard Latin1 representation, 0xF0. It was encoded as 0xF0 UTF-8 then percent encoded, which is %EF%A3%BF. It is now encoded as the UTF-8 then percent encoded representation of its unicode value, 0xF8FF which matches other browsers. This test is still valid, because it still verifies that the URLs in r199590 are rejected, and they still are. See webkit.org/b/163127 * http/tests/contentextensions/make-https-expected.txt: * contentfiltering/block-after-add-data-then-allow-unblock-expected.txt: * contentfiltering/block-after-add-data-then-deny-unblock-expected.txt: * contentfiltering/block-after-finished-adding-data-then-allow-unblock-expected.txt: * contentfiltering/block-after-finished-adding-data-then-deny-unblock-expected.txt: * contentfiltering/block-after-response-then-allow-unblock-expected.txt: * contentfiltering/block-after-response-then-deny-unblock-expected.txt: * contentfiltering/block-after-will-send-request-then-allow-unblock-expected.txt: * contentfiltering/block-after-will-send-request-then-deny-unblock-expected.txt: * fast/backgrounds/background-shorthand-after-set-backgroundSize-expected.txt: * fast/backgrounds/background-shorthand-after-set-backgroundSize.html: * fast/backgrounds/background-shorthand-with-backgroundSize-style-expected.txt: * fast/backgrounds/background-shorthand-with-backgroundSize-style.html: * fast/css/getComputedStyle/computed-style-border-image-expected.txt: * fast/css/getComputedStyle/computed-style-border-image.html: * fast/css/getComputedStyle/computed-style-cross-fade-expected.txt: * fast/css/getComputedStyle/computed-style-cross-fade.html: * fast/css/getComputedStyle/getComputedStyle-background-shorthand-expected.txt: * fast/css/getComputedStyle/getComputedStyle-background-shorthand.html: * fast/css/getComputedStyle/getComputedStyle-list-style-shorthand-expected.txt: * fast/css/getComputedStyle/getComputedStyle-list-style-shorthand.html: URLs with non-special schemes and no slash after the host now do when canonicalized. * fast/css-generated-content/malformed-url.html: This tested what happens when you have an invalid host. | is now a valid host character. I changed it to have a % in the host to test the same behavior. * fast/loader/window-open-to-invalid-url-disallowed.html: * fast/loader/window-open-to-invalid-url-disallowed-expected.txt: * fast/loader/redirect-to-invalid-url-using-meta-refresh-disallowed.html: * fast/loader/redirect-to-invalid-url-using-meta-refresh-disallowed-expected.txt: * fast/loader/redirect-to-invalid-url-using-javascript-disallowed.html: * fast/loader/redirect-to-invalid-url-using-javascript-disallowed-expected.txt: "http://a=a&b=b" is no longer an invalid URL. We used to consider the '&' character to be an invalid domain character and we don't any more. This matches Chrome, Firefox, and the spec. To keep this test testing what happens if you have an invalid URL, I changed the '&' to a '%' which is an invalid domain character. * fast/loader/file-URL-with-port-number.html: File URLs with a port but no host are now invalid, matching Chrome and Firefox. File URLs with a port and a host are Ok, though. * platform/ios-simulator-wk1/fast/loader: Added. * platform/ios-simulator-wk1/fast/loader/redirect-to-invalid-url-using-javascript-disallowed-expected.txt: Added. * platform/ios-simulator-wk1/fast/loader/redirect-to-invalid-url-using-meta-refresh-disallowed-expected.txt: Added. * platform/ios-simulator-wk1/fast/loader/window-open-to-invalid-url-disallowed-expected.txt: Added. * platform/ios-simulator-wk1/imported/w3c/web-platform-tests/XMLHttpRequest: Added. * platform/ios-simulator-wk1/imported/w3c/web-platform-tests/XMLHttpRequest/send-network-error-sync-events.sub-expected.txt: Added. * platform/mac-wk1/fast/loader: Added. * platform/mac-wk1/fast/loader/redirect-to-invalid-url-using-javascript-disallowed-expected.txt: Added. * platform/mac-wk1/fast/loader/redirect-to-invalid-url-using-meta-refresh-disallowed-expected.txt: Added. * platform/mac-wk1/fast/loader/window-open-to-invalid-url-disallowed-expected.txt: Added. * platform/mac-wk1/imported: Added. * platform/mac-wk1/imported/w3c: Added. * platform/mac-wk1/imported/w3c/web-platform-tests: Added. * platform/mac-wk1/imported/w3c/web-platform-tests/XMLHttpRequest: Added. * platform/mac-wk1/imported/w3c/web-platform-tests/XMLHttpRequest/send-network-error-sync-events.sub-expected.txt: Added. * platform/mac/security/block-test-expected.txt: Differences between the URLParser and NSURL's parser cause differences in output for WK1 where NSURLRequests are made without serializing WebCore::ResourceRequests. In particular, '{' in the host is newly accepted as a valid URL by URLParser, but it is percent-encoded by NSURL's parser. See rdar://problem/28701914 git-svn-id: http://svn.webkit.org/repository/webkit/trunk@207162 268f45cc-cd09-0410-ab3c-d52691b4dbfc
I've seen some bad compatibility issues with changing behavior with such URLs. I think we should change the spec after all. I propose that we keep the case of hosts of URLs with non-special schemes and not add a '/' when serializing them. I also propose we continue to punycode-encode hosts of URLs with non-special schemes if they contain a non-ASCII character after being percent-decoded, and not add a '/' when serializing them. Having the same host be percent-encoded with a non-special scheme and punycode-encoded with a special scheme is strange. All browsers would still have to change, but the serialization of URLs like "asdf://HoSt" would be compatible with existing browsers, which agree for differing reasons. |
…non-ASCII characters in such hosts should be punycode-encoded https://bugs.webkit.org/show_bug.cgi?id=163413 Reviewed by Tim Horton. LayoutTests/imported/w3c: * web-platform-tests/url/url-setters-expected.txt: Update results. Some more tests are failing, but if my proposal in whatwg/url#148 is accepted, then these web platform tests will need to be changed. These web platform tests were also failing with the old URL::parse. Source/WebCore: This retains compatibility with the canonicalization Chrome, Firefox, and Safari with uppercase characters in the hosts of URLs with unrecognized schemes. Safari treats such characters as the host, while Firefox and Chrome treat such characters as part of the path, starting with the "//" after the ':' Behavior of non-ASCII characters is inconsistent, and since we need to have a host, we should punycode-encode the host to be consistent with special schemes because percent-encoding hosts sometimes is inconsistent. This solution was proposed to the spec in whatwg/url#148 Covered by updated API and layout tests. * platform/URLParser.cpp: (WebCore::URLParser::parse): (WebCore::URLParser::percentDecode): (WebCore::URLParser::domainToASCII): (WebCore::URLParser::hasInvalidDomainCharacter): (WebCore::URLParser::parseHostAndPort): (WebCore::URLParser::formURLDecode): (WebCore::percentDecode): Deleted. (WebCore::domainToASCII): Deleted. (WebCore::hasInvalidDomainCharacter): Deleted. (WebCore::formURLDecode): Deleted. * platform/URLParser.h: Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): Update parsing results. There are now fewer differences between the new URLParser and the old URL::parse. LayoutTests: * contentfiltering/block-after-add-data-then-allow-unblock-expected.txt: * contentfiltering/block-after-add-data-then-deny-unblock-expected.txt: * contentfiltering/block-after-finished-adding-data-then-allow-unblock-expected.txt: * contentfiltering/block-after-finished-adding-data-then-deny-unblock-expected.txt: * contentfiltering/block-after-response-then-allow-unblock-expected.txt: * contentfiltering/block-after-response-then-deny-unblock-expected.txt: * contentfiltering/block-after-will-send-request-then-allow-unblock-expected.txt: * contentfiltering/block-after-will-send-request-then-deny-unblock-expected.txt: * fast/backgrounds/background-shorthand-after-set-backgroundSize-expected.txt: * fast/backgrounds/background-shorthand-after-set-backgroundSize.html: * fast/backgrounds/background-shorthand-with-backgroundSize-style-expected.txt: * fast/backgrounds/background-shorthand-with-backgroundSize-style.html: * fast/css/getComputedStyle/computed-style-border-image-expected.txt: * fast/css/getComputedStyle/computed-style-border-image.html: * fast/css/getComputedStyle/computed-style-cross-fade-expected.txt: * fast/css/getComputedStyle/computed-style-cross-fade.html: * fast/css/getComputedStyle/getComputedStyle-background-shorthand-expected.txt: * fast/css/getComputedStyle/getComputedStyle-background-shorthand.html: * fast/css/getComputedStyle/getComputedStyle-list-style-shorthand-expected.txt: * fast/css/getComputedStyle/getComputedStyle-list-style-shorthand.html: * fast/loader/url-parse-1-expected.txt: * fast/url/host-lowercase-per-scheme-expected.txt: * fast/url/safari-extension-expected.txt: * http/tests/xmlhttprequest/access-control-and-redirects-async-expected.txt: Update test expectations. This is how they were before r207162, showing that this change to the URLParser increases compatibility. git-svn-id: http://svn.webkit.org/repository/webkit/trunk@207321 268f45cc-cd09-0410-ab3c-d52691b4dbfc
That seem very specific somewhat strange rules. In particular not normalizing ASCII-only in any way, but normalizing if there's any non-ASCII is rather weird. I guess we'd only not add |
I agree that the rules are strange, but they're the most compatible and logical rules I can think of. We need to keep the case sensitivity which matches all browsers, we need to keep the no '/' which matches all browsers (yes, only if the path is empty), and we need to do something with non-ASCII code points. I haven't seen any such URLs that have non-ASCII code points, so I'm not sure what the compatibility problems might be if there are any. Safari fails to parse such URLs, while Chrome and Firefox percent-encode them because they are considered to be in the path. An alternative would be to match the behavior of Chrome and Firefox with all non-special URLs. That would be replacing this: |
The specification used to align with non-Safari. It changed in b266a43 based on https://www.w3.org/Bugs/Public/show_bug.cgi?id=27233. Rationale being that it would be better to deviate less from the original RFCs and Safari showed it was possible. I think I'm okay with your suggested changes, but it would be good to hear from @valenting @sleevi or others as to what Firefox and Chrome might be willing to do here and on what timeframe. Having one browser lead the charge is great, but not enough for success. |
This problem gets worse with IPv4 and IPv6 addresses that are hosts of unrecognized schemes that need canonicalization or fail parsing, like asdf://123456 or asdf://[asdf] |
…schemes as IPv4 address https://bugs.webkit.org/show_bug.cgi?id=164154 Reviewed by Andy Estes. Source/WebCore: This is needed to match behavior of all browsers. This is being discussed in the spec at whatwg/url#148 Covered by new API tests. * platform/URLParser.cpp: (WebCore::URLParser::parseHostAndPort): Only try to parse and canonicalize the host as an IPv4 address if the scheme is special (http, wss, etc.) Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): git-svn-id: http://svn.webkit.org/repository/webkit/trunk@208086 268f45cc-cd09-0410-ab3c-d52691b4dbfc
So non-special schemes get no host parsing whatsoever basically? Somewhat sad and I guess that means that if we ever get another DNS-based scheme we'll have to grow the list of special schemes. 😟 |
Basically, except we need to do something with non-ASCII code points. I haven't come across any actual uses of unrecognized schemes with non-ASCII code points in the hosts, but its behavior should be defined and I think percent-encoding is a bad idea. |
Only percent-encoding seems reasonable given that we cannot normalize for ASCII. |
It's strange, but behavior here is going to be strange no matter what, and percent-encoding matches existing behavior of the canonicalized URL string. I'm wondering if Chrome and Firefox are willing to change their behavior of not having a host for non-special schemes, though |
I think I can say that Firefox is willing to change behavior here, though I don't think it'll be necessarily short term. |
@annevk Sorry, somehow I missed these pings in my inbox due to accidentally muting the thread (likely fat fingered something) I admit, I haven't read the WHATWG spec that closely for the context of the issue - much of our GURL implementation reflected (or tried to) RFC3986. However, where I suspect the incompat has arisen is with respect to standard URLs (e.g. those with authority components in a structured form). Despite https://tools.ietf.org/html/rfc3986#section-1.1.1 remarking that
namely, that all unrecognized schemes can be assumed generic unless/until they're supported, the GURL implementation treats all unrecognized schemes as non-standard (specifically, https://cs.chromium.org/chromium/src/url/url_util.cc?rcl=1478435172&l=105 ). That is, if it's not a scheme GURL explicitly knows how to parse, it treats it as opaque, and everything as the path. This would likely explain the divergence here, as well as the rationale. This decision has then cascaded into a number of design decisions throughout Chrome with respect to non-standard schemes it exposes (e.g. chrome-extension://, chrome://, chrome-guest://, etc), which all assume they can safely be extended as non-standard schemes (omitting authority, and following any number of internal structural rules) As such, it's beyond my ken to know what the extent of complexity or negative affect would be - much of it exists in those consuming our low-level URL parser, but I don't have good knowledge about those or their implementation implications. In order for us to get that desired behaviour, it would effectively mean treating unrecognized schemes (asdf://) as generic/standard schemes, so that they get the same behaviours - but because we don't 'force' internal callers to pre-register their schemes (... something we definitely should do, to prevent the unknown-unknowns like this), I don't know who would break, and I don't personally have the time to explore making the change and seeing what breaks and guiding it. I suspect it'd be non-trivial, but I suspect that the above code really is as simple as changing that logic so that non-standard schemes are explicitly registered as such, in line with RFC 3986. |
Note that RFC 3986 (and the URL Standard to a lesser extent due to compat) require all schemes to be parsed in the same way. Only post-parsing can you do scheme-specific dispatch. The whole idea of "standard URL" vs "non-standard URL" is something that sounds more like it's copied from Gecko (and perhaps by extension older RFCs) than the latest RFC. |
I'm ok with percent-encoding non-ascii values in hosts of URLs with non-special schemes. If the scheme is changed to a special scheme, the percent-encoded values will be percent-decoded again before being punycode encoded, so no information is lost. Switching from special to a non-special scheme could be strange, though. |
@achristensen07 I'd be in favor of forbidding such drastic scheme changes. Especially if they require re-parsing the host when done. That seems bad. |
…he first slash after the colon https://bugs.webkit.org/show_bug.cgi?id=164555 Patch by Alex Christensen <achristensen@webkit.org> on 2016-11-09 Reviewed by Tim Horton. LayoutTests/imported/w3c: * web-platform-tests/url/a-element-expected.txt: * web-platform-tests/url/a-element-xhtml-expected.txt: * web-platform-tests/url/url-constructor-expected.txt: Source/WebCore: When we see a url that is only scheme:// we treated the // as the path. Firefox did this with unrecognized schemes, but based on whatwg/url#148 they seem willing to change. We had added similar behavior to URL::parse, and I added this to URLParser in r206783 which this effectively reverts. Covered by API and layout tests. * platform/URLParser.cpp: (WebCore::URLParser::parse): Don't move m_userStart to m_pathStart back by two when we see an empty host. Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): LayoutTests: * fast/url/segments-expected.txt: * fast/url/segments-from-data-url-expected.txt: * fast/loader/url-parse-1-expected.txt: * fetch/fetch-url-serialization-expected.txt: git-svn-id: http://svn.webkit.org/repository/webkit/trunk@208508 268f45cc-cd09-0410-ab3c-d52691b4dbfc
… the base URL is an applewebdata: URL https://bugs.webkit.org/show_bug.cgi?id=165621 Reviewed by Dan Bernstein. Source/WebCore: Covered by new API tests. * platform/URLParser.cpp: (WebCore::URLParser::parse): URLs with nonspecial schemes and no slash after the host get no slash as the path to maintain compatibility with all browsers. This was proposed to the URL spec in whatwg/url#148 When such as URL is used as a base URL with a relative path, in order to maintain compatibility with URL::parse we need to prepend a slash to the path. For completeness I added tests with a relative path, a relative query, a relative fragment, and a relative empty string, and because the fate of the spec is unclear in this case, I decided to maintain compatibility with URL::parse in all these cases. Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): git-svn-id: http://svn.webkit.org/repository/webkit/trunk@209572 268f45cc-cd09-0410-ab3c-d52691b4dbfc
For URLs without a special scheme we cannot use the host parser directly due to compatibility issues and we need to serialize slightly differently too. Fixes #148.
FWIW, those kind of scheme changes are not allowed by the URL Standard. And haven't been for quite a while. |
@achristensen07 when looking at paths I found other problems too, some of which crept into your implementation. E.g., |
For URLs without a special scheme we cannot use the host parser directly due to compatibility issues. Instead we percent-encode the input. Fixes #148.
For URLs without a special scheme we cannot use the host parser directly due to compatibility issues. Instead we percent-encode the input. Tests: web-platform-tests/wpt#4406. Fixes #148.
Ignoring extra beginning slashes is new in WebKit. I haven't seen any compatibility problems. I don't have strong feelings about it. |
For URLs without a special scheme we cannot use the host parser directly due to compatibility issues. Instead we percent-encode the input. Also make sure that if "userinfo" or port is present, host is non-empty. Tests: web-platform-tests/wpt#4406. Fixes #148 and fixes #214.
How do we feel about the port of such URLs? I'm seeing some compatibility issues with URLs like "asdf://host:123456789123456789/" that used to be valid |
@achristensen07 that should be a new issue. I guess at some point we have to decide how far we go to make this work. |
https://bugs.webkit.org/show_bug.cgi?id=162660 <rdar://28601706> Reviewed by Sam Weinig. LayoutTests/imported/w3c: * web-platform-tests/XMLHttpRequest/send-network-error-sync-events.sub-expected.txt: * web-platform-tests/html/semantics/embedded-content/media-elements/interfaces/HTMLElement/HTMLTrackElement/src-expected.txt: * web-platform-tests/html/semantics/scripting-1/the-script-element/fetch-src/failure-expected.txt: These tests need more investigation. See https://bugs.webkit.org/show_bug.cgi?id=163127 * web-platform-tests/url/a-element-expected.txt: * web-platform-tests/url/a-element-xhtml-expected.txt: * web-platform-tests/url/url-constructor-expected.txt: * web-platform-tests/url/url-setters-expected.txt: Many more tests pass. Hooray! Source/WebCore: Covered by updates to many LayoutTests. * platform/URLParser.cpp: Make the default value true for URLParser::enabled. This is the most impactful and well-documented one-line change I've ever written. LayoutTests: Many failing tests are now passing. The tests in fast/url look like they are an old test suite, some of which we were failing. We now pass many more of the tests. Those results are updated. Some URLs in the suite are invalid, and we now "fail" those tests. Rather than update the tests, I just changed the expectation to FAIL, which seems to be tolerable in this directory because there were many tests whose result was FAIL. Each such case is explained below. * fast/dom/DOMURL/parsing-expected.txt: * fast/dom/DOMURL/parsing.html: Percent-encoded values in the host are supposed to be decoded according to the spec. %2f decodes to '/' which is an invalid domain character. * fast/dom/DOMURL/set-href-attribute-hash-expected.txt: * fast/dom/DOMURL/set-href-attribute-hash.html: Added a space to the domain (which is an invalid domain character and the others in this test are not according to the spec) in order to continue to test that setting the hash of an invalid URL does not change its href. * fast/dom/DOMURL/set-href-attribute-protocol-expected.txt: * fast/dom/DOMURL/set-href-attribute-protocol.html: * fast/dom/HTMLAnchorElement/set-href-attribute-protocol-expected.txt: * fast/dom/HTMLAnchorElement/set-href-attribute-protocol.html: "http:??bar" now canonicalizes to "http://??bar" instead of adding one slash. * fast/url/file-expected.txt: * fast/url/file-http-base-expected.txt: Updated results. Many tests that were failing are now passing. * fast/url/anchor-expected.txt: Percent-encoding of non-ASCII characters in fragments now matches Firefox. * fast/url/host-expected.txt: Wide characters in the host such as http://%ef%bc%85%ef%bc%90%ef%bc%90.com/ should fail to parse. This matches Chrome and the spec. URLs with an empty host with a port should fail to parse. This matches Chrome, Firefox, and the spec. * fast/url/host-lowercase-per-scheme-expected.txt: According to spec, hosts of non-special URLs should be parsed the same as special URL hosts. Different browsers seem to have the existing behavior for different reasons. See whatwg/url#148 and https://bugs.webkit.org/show_bug.cgi?id=162885 * fast/url/idna2003-expected.txt: * fast/url/invalid-urls-utf8-expected.txt: Host encoding is now done according to the spec. * fast/url/invalid-idn-expected.txt: Neither Chrome, Firefox, nor the spec change invalid hosts to about:blank. * fast/url/ipv4-expected.txt: * fast/url/ipv6-expected.txt: "http://[0:0::0:0:8:]/" should indeed be compressed to "http://[::8]/" This kind of deterministic compression makes it so that two IPv6 addresses that are equal will parse to URLs that are also equal, even if they are written differently. * fast/url/path-expected.txt: * fast/url/relative-expected.txt: * fast/url/relative-win-expected.txt: * fast/url/safari-extension-expected.txt: Proper canonicalization of non-special hosts should be scheme://host/ or scheme:/// if there is no host. safari-extension is not special. Hosts should always be canonicalized to lowercase. * fast/url/segments-expected.txt: * fast/url/segments-from-data-url-expected.txt: The path of "foo://" should be "/" not "//". Extra slashes immediately after scheme:// should be ignored. URLs with no host but a port like "http:@:80/www.apple.com" are now invalid, matching Chrome, Firefox, and the spec. * fast/url/segments-userinfo-vs-host-expected.txt: '@' can be in the user. If it is, it is percent encoded. This matches Chrome and Firefox. "foo://" has a path of "/" not "//" Extra slashes after the scheme such as in "foo://///////" are now ignored according to spec. * fast/url/standard-url-expected.txt: * fast/url/tab-and-newline-stripping-expected.txt: http://[2001:5::042:44::0370:7334]/ is an invalid IPv6 address, so parsing it should fail. It passed with URL::parse because we used to only check that the characters inside the [] were valid ipv6 characters, not that they made any sense or were in any kind of bounds. * fast/url/url-credentials-escaping-expected.txt: Credential encoding is now according to spec. * http/tests/appcache/resources/x-frame-options-prevents-framing-test.html: http:/path1/path2 relative to http://host/path3 now canonicalizes to http://host/path1/path2 instead of http://path1/path2 so this test, which I believe was missing the second slash in error, needs to be fixed. * imported/w3c/web-platform-tests/XMLHttpRequest/send-network-error-sync-events.sub-expected.txt: Having a '}' in the host of a URL used to be invalid and it is now percent-escaped, matching Chrome and the spec. This test still passes on w3c-test.org. We can look into why it is failing locally later. See webkit.org/b/163127 * fast/loader/redirect-to-invalid-url-using-javascript-calls-policy-delegate-expected.txt: * fast/loader/redirect-to-invalid-url-using-meta-refresh-calls-policy-delegate-expected.txt: * fast/loader/window-open-to-invalid-url-calls-policy-delegate-expected.txt: http://HoSt is now being correctly interpreted as the host, and it is being punycode encoded if it's non-ASCII and lowercased if it is. * fast/forms/ValidityState-typeMismatch-url.html: * fast/forms/ValidityState-typeMismatch-url-expected.txt: Spaces in the host are invalid. This matches Firefox and the spec. * http/tests/inspector/network/copy-as-curl.html: '{' and '}' are now percent encoded in the URL path. This matches Firefox, Chrome, and the spec. * fast/loader/location-port.html: * fast/loader/location-port-expected.txt: parsing or setting ports in URLs with no host is no longer supported. This matches Firefox and Chrome. * security/block-test-expected.txt: * platform/mac/security/block-test-expected.txt: out-of-bounds ports now cause parsing failures. * imported/w3c/web-platform-tests/html/semantics/scripting-1/the-script-element/fetch-src/failure-expected.txt: "http://[]/" now fails to parse because it is an invalid IPv6 host. * fast/url/ipv6-expected.txt: IPv4 addresses at the end of IPv6 addresses are now serialized as the equivalent hex value in IPv6 form. This matches Chrome and the spec, and makes it so that equal IPv6 addresses written in different forms are equal. * fast/loader/url-parse-1-expected.txt: Extra or missing slashes and spaces around scheme:// are now handled according to the spec. * http/tests/websocket/tests/hybi/handshake-ok-with-http-version-beyond-1_1-expected.txt: The non-standard apple logo character is represented here by its non-standard Latin1 representation, 0xF0. It was encoded as 0xF0 UTF-8 then percent encoded, which is %EF%A3%BF. It is now encoded as the UTF-8 then percent encoded representation of its unicode value, 0xF8FF which matches other browsers. This test is still valid, because it still verifies that the URLs in r199590 are rejected, and they still are. See webkit.org/b/163127 * http/tests/contentextensions/make-https-expected.txt: * contentfiltering/block-after-add-data-then-allow-unblock-expected.txt: * contentfiltering/block-after-add-data-then-deny-unblock-expected.txt: * contentfiltering/block-after-finished-adding-data-then-allow-unblock-expected.txt: * contentfiltering/block-after-finished-adding-data-then-deny-unblock-expected.txt: * contentfiltering/block-after-response-then-allow-unblock-expected.txt: * contentfiltering/block-after-response-then-deny-unblock-expected.txt: * contentfiltering/block-after-will-send-request-then-allow-unblock-expected.txt: * contentfiltering/block-after-will-send-request-then-deny-unblock-expected.txt: * fast/backgrounds/background-shorthand-after-set-backgroundSize-expected.txt: * fast/backgrounds/background-shorthand-after-set-backgroundSize.html: * fast/backgrounds/background-shorthand-with-backgroundSize-style-expected.txt: * fast/backgrounds/background-shorthand-with-backgroundSize-style.html: * fast/css/getComputedStyle/computed-style-border-image-expected.txt: * fast/css/getComputedStyle/computed-style-border-image.html: * fast/css/getComputedStyle/computed-style-cross-fade-expected.txt: * fast/css/getComputedStyle/computed-style-cross-fade.html: * fast/css/getComputedStyle/getComputedStyle-background-shorthand-expected.txt: * fast/css/getComputedStyle/getComputedStyle-background-shorthand.html: * fast/css/getComputedStyle/getComputedStyle-list-style-shorthand-expected.txt: * fast/css/getComputedStyle/getComputedStyle-list-style-shorthand.html: URLs with non-special schemes and no slash after the host now do when canonicalized. * fast/css-generated-content/malformed-url.html: This tested what happens when you have an invalid host. | is now a valid host character. I changed it to have a % in the host to test the same behavior. * fast/loader/window-open-to-invalid-url-disallowed.html: * fast/loader/window-open-to-invalid-url-disallowed-expected.txt: * fast/loader/redirect-to-invalid-url-using-meta-refresh-disallowed.html: * fast/loader/redirect-to-invalid-url-using-meta-refresh-disallowed-expected.txt: * fast/loader/redirect-to-invalid-url-using-javascript-disallowed.html: * fast/loader/redirect-to-invalid-url-using-javascript-disallowed-expected.txt: "http://a=a&b=b" is no longer an invalid URL. We used to consider the '&' character to be an invalid domain character and we don't any more. This matches Chrome, Firefox, and the spec. To keep this test testing what happens if you have an invalid URL, I changed the '&' to a '%' which is an invalid domain character. * fast/loader/file-URL-with-port-number.html: File URLs with a port but no host are now invalid, matching Chrome and Firefox. File URLs with a port and a host are Ok, though. * platform/ios-simulator-wk1/fast/loader: Added. * platform/ios-simulator-wk1/fast/loader/redirect-to-invalid-url-using-javascript-disallowed-expected.txt: Added. * platform/ios-simulator-wk1/fast/loader/redirect-to-invalid-url-using-meta-refresh-disallowed-expected.txt: Added. * platform/ios-simulator-wk1/fast/loader/window-open-to-invalid-url-disallowed-expected.txt: Added. * platform/ios-simulator-wk1/imported/w3c/web-platform-tests/XMLHttpRequest: Added. * platform/ios-simulator-wk1/imported/w3c/web-platform-tests/XMLHttpRequest/send-network-error-sync-events.sub-expected.txt: Added. * platform/mac-wk1/fast/loader: Added. * platform/mac-wk1/fast/loader/redirect-to-invalid-url-using-javascript-disallowed-expected.txt: Added. * platform/mac-wk1/fast/loader/redirect-to-invalid-url-using-meta-refresh-disallowed-expected.txt: Added. * platform/mac-wk1/fast/loader/window-open-to-invalid-url-disallowed-expected.txt: Added. * platform/mac-wk1/imported: Added. * platform/mac-wk1/imported/w3c: Added. * platform/mac-wk1/imported/w3c/web-platform-tests: Added. * platform/mac-wk1/imported/w3c/web-platform-tests/XMLHttpRequest: Added. * platform/mac-wk1/imported/w3c/web-platform-tests/XMLHttpRequest/send-network-error-sync-events.sub-expected.txt: Added. * platform/mac/security/block-test-expected.txt: Differences between the URLParser and NSURL's parser cause differences in output for WK1 where NSURLRequests are made without serializing WebCore::ResourceRequests. In particular, '{' in the host is newly accepted as a valid URL by URLParser, but it is percent-encoded by NSURL's parser. See rdar://problem/28701914 Canonical link: https://commits.webkit.org/181129@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@207162 268f45cc-cd09-0410-ab3c-d52691b4dbfc
…non-ASCII characters in such hosts should be punycode-encoded https://bugs.webkit.org/show_bug.cgi?id=163413 Reviewed by Tim Horton. LayoutTests/imported/w3c: * web-platform-tests/url/url-setters-expected.txt: Update results. Some more tests are failing, but if my proposal in whatwg/url#148 is accepted, then these web platform tests will need to be changed. These web platform tests were also failing with the old URL::parse. Source/WebCore: This retains compatibility with the canonicalization Chrome, Firefox, and Safari with uppercase characters in the hosts of URLs with unrecognized schemes. Safari treats such characters as the host, while Firefox and Chrome treat such characters as part of the path, starting with the "//" after the ':' Behavior of non-ASCII characters is inconsistent, and since we need to have a host, we should punycode-encode the host to be consistent with special schemes because percent-encoding hosts sometimes is inconsistent. This solution was proposed to the spec in whatwg/url#148 Covered by updated API and layout tests. * platform/URLParser.cpp: (WebCore::URLParser::parse): (WebCore::URLParser::percentDecode): (WebCore::URLParser::domainToASCII): (WebCore::URLParser::hasInvalidDomainCharacter): (WebCore::URLParser::parseHostAndPort): (WebCore::URLParser::formURLDecode): (WebCore::percentDecode): Deleted. (WebCore::domainToASCII): Deleted. (WebCore::hasInvalidDomainCharacter): Deleted. (WebCore::formURLDecode): Deleted. * platform/URLParser.h: Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): Update parsing results. There are now fewer differences between the new URLParser and the old URL::parse. LayoutTests: * contentfiltering/block-after-add-data-then-allow-unblock-expected.txt: * contentfiltering/block-after-add-data-then-deny-unblock-expected.txt: * contentfiltering/block-after-finished-adding-data-then-allow-unblock-expected.txt: * contentfiltering/block-after-finished-adding-data-then-deny-unblock-expected.txt: * contentfiltering/block-after-response-then-allow-unblock-expected.txt: * contentfiltering/block-after-response-then-deny-unblock-expected.txt: * contentfiltering/block-after-will-send-request-then-allow-unblock-expected.txt: * contentfiltering/block-after-will-send-request-then-deny-unblock-expected.txt: * fast/backgrounds/background-shorthand-after-set-backgroundSize-expected.txt: * fast/backgrounds/background-shorthand-after-set-backgroundSize.html: * fast/backgrounds/background-shorthand-with-backgroundSize-style-expected.txt: * fast/backgrounds/background-shorthand-with-backgroundSize-style.html: * fast/css/getComputedStyle/computed-style-border-image-expected.txt: * fast/css/getComputedStyle/computed-style-border-image.html: * fast/css/getComputedStyle/computed-style-cross-fade-expected.txt: * fast/css/getComputedStyle/computed-style-cross-fade.html: * fast/css/getComputedStyle/getComputedStyle-background-shorthand-expected.txt: * fast/css/getComputedStyle/getComputedStyle-background-shorthand.html: * fast/css/getComputedStyle/getComputedStyle-list-style-shorthand-expected.txt: * fast/css/getComputedStyle/getComputedStyle-list-style-shorthand.html: * fast/loader/url-parse-1-expected.txt: * fast/url/host-lowercase-per-scheme-expected.txt: * fast/url/safari-extension-expected.txt: * http/tests/xmlhttprequest/access-control-and-redirects-async-expected.txt: Update test expectations. This is how they were before r207162, showing that this change to the URLParser increases compatibility. Canonical link: https://commits.webkit.org/181240@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@207321 268f45cc-cd09-0410-ab3c-d52691b4dbfc
…schemes as IPv4 address https://bugs.webkit.org/show_bug.cgi?id=164154 Reviewed by Andy Estes. Source/WebCore: This is needed to match behavior of all browsers. This is being discussed in the spec at whatwg/url#148 Covered by new API tests. * platform/URLParser.cpp: (WebCore::URLParser::parseHostAndPort): Only try to parse and canonicalize the host as an IPv4 address if the scheme is special (http, wss, etc.) Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): Canonical link: https://commits.webkit.org/181868@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208086 268f45cc-cd09-0410-ab3c-d52691b4dbfc
… the base URL is an applewebdata: URL https://bugs.webkit.org/show_bug.cgi?id=165621 Reviewed by Dan Bernstein. Source/WebCore: Covered by new API tests. * platform/URLParser.cpp: (WebCore::URLParser::parse): URLs with nonspecial schemes and no slash after the host get no slash as the path to maintain compatibility with all browsers. This was proposed to the URL spec in whatwg/url#148 When such as URL is used as a base URL with a relative path, in order to maintain compatibility with URL::parse we need to prepend a slash to the path. For completeness I added tests with a relative path, a relative query, a relative fragment, and a relative empty string, and because the fate of the spec is unclear in this case, I decided to maintain compatibility with URL::parse in all these cases. Tools: * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): Canonical link: https://commits.webkit.org/183231@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@209572 268f45cc-cd09-0410-ab3c-d52691b4dbfc
new URL("unknown://†/")
does not result in "Punycode" (results in%E2%80%A0
) andnew URL("notspecial://H%4fSt/path")
does not remove the URL encoding (results inH%4fSt
nothost
). It seems all implementations are in agreement here so I'm not sure it's a change worth pursuing. Barring any objections I plan on changing the standard.Acknowledge: Alex Christensen.
The text was updated successfully, but these errors were encountered: