-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address several IDNA issues #309
Conversation
So what should we do in Node.js/whatwg-url? Should we take this as license to violate the spec, so that our users can use our libraries on YouTube? I'd kind of prefer the spec just match browsers here (and in most places); we've never had success getting implementers to comment on URL parsing issues before, and I doubt it's going to start here. |
Sigh. I guess you're right. It's just such dumb Latin-alphabet-centric behavior that I'm having a hard time accepting this is what we want long term. And I'd rather fix issues once than multiple times. |
To match browser behavior fast path ascii only domains and do not run ToASCII on them. Fixes: nodejs#12965 Refs: nodejs#12966 Refs: whatwg/url#309
To match browser behavior fast path ascii only domains and do not run ToASCII on them. Fixes: nodejs#12965 Refs: nodejs#12966 Refs: whatwg/url#309
To match browser behavior fast path ascii only domains and do not run ToASCII on them. Fixes: nodejs#12965 Refs: nodejs#12966 Refs: whatwg/url#309
To match browser behavior fast path ascii only domains and do not run ToASCII on them. Fixes: nodejs#12965 Refs: nodejs#12966 Refs: whatwg/url#309
To match browser behavior fast path ascii only domains and do not run ToASCII on them. Fixes: nodejs#12965 Refs: nodejs#12966 Refs: whatwg/url#309
Following tests show, what not all browsers use fast path for ASCII only input:
As you see Edge checks the Firefox doesn't fail on invalid labels at all and the last test result shows browser's bug, which must be fixed anyway. Note: I tested with |
Thanks for those tests. Chrome's behavior looks consistent? If all of the input is ASCII, don't run ToASCII. Label splitting happens as part of ToASCII so is not a consideration until then. |
If follow Chrome's behavior, then it's unclear |
I see, the problem is that they use ToUnicode (which never fails) after which it no longer roundtrips. That does indeed seem bad. |
To match browser behavior fast path ascii only domains and do not run ToASCII on them. Fixes: nodejs#12965 Refs: nodejs#12966 Refs: whatwg/url#309
@rmisev sorry for leaving this. So what is the processing model we want? ASCII lowercase input; split input on "."; if a label starts with "xn--" or contains non-ASCII, run input through ToASCII? That's getting rather complicated. Always invoking ToASCII and setting the ignore hyphen flag might end up being better, no? |
I think simpler is better, so always |
There are valid domain names with hyphens at 3 and 4th position, new node WHATWG URL parser was failing for it assume its an invalid IDN. Also filters IDN errors when domain label start or end with hyphen. Also fix error in ToUnicode Fixes: nodejs#12965 Refs: https://www.icann.org/news/announcement-2000-01-07-en Refs: whatwg/url#309 (comment)
I updated the PR to point to the proposed update of UTS46 and set CheckHyphens to false. |
We're working on validating the tests via jsdom/whatwg-url in jsdom/whatwg-url#90, but kind of stuck since we don't implement CheckJoiners and CheckBidi. We'll probably go ahead and merge anyway, but would Node.js be able to create an implementation that passes the tests instead? That'd be even better. |
Yes I think so. In fact, cases 3 and 4 here (the two corresponding to CheckBidi) are already handled correctly in Node.js 8.0.0. CheckJoiners was not included in the release, since at the time the PR was merged there was not yet a resolution on CheckJoiners. |
Great. Would it make sense to put off merging this PR until there is a corresponding verification from Node.js that the tests and spec are implementable? That's been our general preference so far, using either whatwg-url or Node.js. |
And indeed there is: nodejs/node#13362 I'll look into implementing the changes in tr46.js and whatwg-url as well. Marginally off-topic, but just wanted to point out that there seems to be a circular dependency between the URL Standard, Unicode IDNA spec, whatwg-url, Node.js, and WPT; specifically:
|
OK, looks like Node.js has a passing implementation, so let's merge this! BTW the dependency is not cyclic, as we don't need a merged PR, just an existing PR :). |
Remove custom tests for invalid IDNA domains in url-idna.js in favor of the more comprehensive official set. Refs: whatwg/url#309 Refs: web-platform-tests/wpt#5976
Remove custom tests for invalid IDNA domains in url-idna.js in favor of the more comprehensive official set. PR-URL: nodejs#13362 Refs: whatwg/url#309 Refs: web-platform-tests/wpt#5976 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Remove custom tests for invalid IDNA domains in url-idna.js in favor of the more comprehensive official set. PR-URL: #13362 Refs: whatwg/url#309 Refs: web-platform-tests/wpt#5976 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Remove custom tests for invalid IDNA domains in url-idna.js in favor of the more comprehensive official set. PR-URL: nodejs#13362 Refs: whatwg/url#309 Refs: web-platform-tests/wpt#5976 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Remove custom tests for invalid IDNA domains in url-idna.js in favor of the more comprehensive official set. Backport-PR-URL: #17365 PR-URL: #13362 Refs: whatwg/url#309 Refs: web-platform-tests/wpt#5976 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Remove custom tests for invalid IDNA domains in url-idna.js in favor of the more comprehensive official set. Backport-PR-URL: #17365 PR-URL: #13362 Refs: whatwg/url#309 Refs: web-platform-tests/wpt#5976 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Remove custom tests for invalid IDNA domains in url-idna.js in favor of the more comprehensive official set. Backport-PR-URL: #17365 PR-URL: #13362 Refs: whatwg/url#309 Refs: web-platform-tests/wpt#5976 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Remove custom tests for invalid IDNA domains in url-idna.js in favor of the more comprehensive official set. Backport-PR-URL: #17365 PR-URL: #13362 Refs: whatwg/url#309 Refs: web-platform-tests/wpt#5976 Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in [1] and [2], this can result in some undesirable behavior. Adding a note prevents implementors from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step. [1]: whatwg#267 [2]: whatwg#309 (comment)
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in [1] and [2], this can result in some undesirable behavior. Adding a note prevents implementors from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step. [1]: whatwg#267 [2]: whatwg#309 (comment)
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in [1] and [2], this can result in some undesirable behavior. Adding a note prevents implementors from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step. [1]: whatwg#267 [2]: whatwg#309 (comment)
Many implementations currently skip ToASCII if domain is ASCII-only, but as discovered in #267 and #309 (comment), this can result in some undesirable behavior. Adding a note prevents implementers from making the mistake of thinking ToASCII is a no-op if the input is ASCII, and also provides a recommendation on how to properly optimize the ToASCII step.
As reported at
#53 (comment) this is
causing issues in non-browser implementations.
Preview | Diff