-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meta: UTS46 feedback #744
Comments
+1.
I think it's still valuable to test what a UTS46 implementation does for these kinds of inputs, even if the URL host parser will later interpret them as invalid IPv4 addresses rather than domain names. But as you say, it is non-trivial to detect these inputs because of limitations in the APIs exposed by browsers, so perhaps these tests should include a flag. An idea - bearing in mind that URLs are part of a living standard (and things like the ends-in-a-number predicate have changed fairly recently), perhaps the tests should include a set of "URLX" flags for use by implementors of this standard? To mark inputs which are technically allowed by UTS46 but may not be usable in URLs.
Status U1 is not used at all by the test files :( |
I submitted the CheckBidi feedback and added it to OP. @karwa for now I did not submit feedback around your comment above. In a couple of months we'll find out how this initial round went. If there's still issues with the tests after that we cannot resolve through a filter (as I did now) let's put a more coherent proposal together. |
Hi @annevk, regarding
I am looking at the text of UTS46 and I don't see what should be changed. For CheckBidi and CheckJoiners, we just refer to the RFCs. We have some checks like
but it's pretty obvious what to do when the label is empty, or has fewer than 4 characters. Please clarify. (FYI @macchiati) |
The problem is that the RFCs assume they are passed a label that is not the empty string. So we shouldn't call into the RFC when that is not the case. |
Looking at “the ContextJ rules”, https://www.rfc-editor.org/rfc/rfc5892.html#appendix-A processes a label with a pseudo-code loop of For CheckBidi, I see that IDNA2008 appears to trigger the rule only if the label contains RTL characters while UTS46 triggers it if the domain name contains RTL. (Although https://www.rfc-editor.org/rfc/rfc5893.html#section-2 says that it “applies to labels in Bidi domain names”.) It seems like we could clarify this in UTS46 with this insertion:
|
Alternative change: We could make this small insertion at the beginning of 4.1 Validity Criteria: “Each of the following criteria must be satisfied for a non-empty label” |
It looks like a number of changes have been made in response to our feedback: https://unicode.org/reports/tr46/#Modifications. I haven't yet made the time to review in detail. |
I'm wondering if we should be making use of the new IgnoreInvalidPunycode flag. |
Feedback I submitted to be considered for the Unicode April 2023 meeting.
— #733 (comment)
The text was updated successfully, but these errors were encountered: