-
Notifications
You must be signed in to change notification settings - Fork 7.3k
url.parse does not handle IPv6 link-local addresses correctly #9404
Comments
Looking through the code, it seems like the IPv6 hostname detection may need to be done before the invalid host character detection so that different invalid host characters can be used for IPv6 hostnames. |
According to that reasoning, I would imagine changing diff --git a/lib/url.js b/lib/url.js
index c5a3793..e696547 100644
--- a/lib/url.js
+++ b/lib/url.js
@@ -70,6 +70,7 @@ var protocolPattern = /^([a-z0-9.+-]+:)/i,
// are the ones that are *expected* to be seen, so we fast-path
// them.
nonHostChars = ['%', '/', '?', ';', '#'].concat(autoEscape),
+ nonHostCharsIPv6 = ['/', '?', ';', '#'].concat(autoEscape),
hostEndingChars = ['/', '?', '#'],
hostnameMaxLen = 255,
hostnamePartPattern = /^[+a-z0-9A-Z_-]{0,63}$/,
@@ -216,11 +217,19 @@ Url.prototype.parse = function(url, parseQueryString, slashesDenoteHost) {
rest = rest.slice(atSign + 1);
this.auth = decodeURIComponent(auth);
}
+
+ // if hostname begins with [ and is followed by ]
+ // assume that it's an IPv6 address.
+ // :: is the smallest IPv6 address possible, so ] must be > 2 positions to the right of [
+ var ipv6Hostname = rest[0] === '[' &&
+ rest.indexOf(']') > 2;
// the host is the remaining to the left of the first non-host char
+ // use the appropriate non-host char set
+ var invalidHostChars = ipv6Hostname ? nonHostCharsIPv6 : nonHostChars;
hostEnd = -1;
- for (var i = 0; i < nonHostChars.length; i++) {
- var hec = rest.indexOf(nonHostChars[i]);
+ for (var i = 0; i < invalidHostChars.length; i++) {
+ var hec = rest.indexOf(invalidHostChars[i]);
if (hec !== -1 && (hostEnd === -1 || hec < hostEnd))
hostEnd = hec;
}
@@ -238,11 +247,6 @@ Url.prototype.parse = function(url, parseQueryString, slashesDenoteHost) {
// so even if it's empty, it has to be present.
this.hostname = this.hostname || '';
- // if hostname begins with [ and ends with ]
- // assume that it's an IPv6 address.
- var ipv6Hostname = this.hostname[0] === '[' &&
- this.hostname[this.hostname.length - 1] === ']';
-
// validate a little.
if (!ipv6Hostname) {
var hostparts = this.hostname.split(/\./); Full code is here I can create a pull request if that will be helpful. |
@cthayer Thank you for reporting and investigating this issue! Adding to milestone 0.12.2, because:
I'm not familiar with IPv6 addresses' format, but please submit a PR with your change, it'll make it much easier to discuss it. Thank you again! 👍 |
@misterdjules No problem. Happy to be able to help out. Submitted pull request #9411 with my proposed changes and relevant tests. Your reasoning on the importance of the bug fix makes sense, though of course I would love to see this fixed in 0.12.1 since it's making it hard to use npm modules in a project I'm working on, but I will defer to your work flow. Currently, I'm using the sockets directly to avoid the use of In IPv6, link-local addresses can contain a zone that specifies the interface that the address belongs to. The zone is in the hostname portion of a URL and is separated from the address by a % character (which is not a valid host character in IPv4). |
@cthayer Thanks for the clarification! My goal this week is to prioritize issues in the 0.12.1 milestone to get it out in a reasonable amount of time, so hopefully we'll get to v0.12.2 sooner rather than later! |
@misterdjules Sounds good. Looking forward to the 0.12.2 release! |
@cthayer Because of the recent vulnerabilities in OpenSSL, we had to do a v0.12.1 release with just security fixes, so all v0.12.x releases have been shifted. As a result, what I called v0.12.2 in my previous comment is now v0.12.3. |
@misterdjules Understood. Thanks for keeping me updated! |
This looks like a zone-identifier support instead of a bug. @cthayer this patch tries to bypass zone-id or add a support? In case of supporting zone identifiers, it would be nice to have a 'zoneId' etc. parameter on url.parse's return value. |
@obastemur this patch is adding support for zone-id by allowing the Happy to take a stab at adding the |
'%' is a special character that is used for escaping the special chars. url.parse shouldn't accept or try to parse a mistakenly escaped address. (this can be from an old device not knowing what zone-id is) The important point here is that this exception is an addition to previous spec. that '%' starts at the end of the address definition. IMHO if url.parse is going to handle zone-id (according to spec.), it should better put it into a parameter. I'm not sure which naming is best for it though. ('zoneId' etc.) |
I felt my implementation was safe (spec wise) because it only allows the '%' character in the host portion if In an IPv6 address, special character escaping is not allowed because special characters are not valid characters in an IPv6 address so the percent is safe there. Also probably why it was chosen as the separator in the IPv6 spec. I understand not wanting to break backwards compatibility, but I don't think my implementation is safe there. That said, I'm open to whatever is determined to be the best implementation. I'm all for naming the parameter according to spec (i.e. zoneId) |
Removing from the 0.12.4 milestone according to my comment in the corresponding PR. |
Note, nodejs/node-eps#28 proposes to implement the WHATWG URL standard, but in the latest standard revision including the zoneid in the URL is discarded: https://url.spec.whatwg.org/#hosts-%28domains-and-ip-addresses%29 Anyways, it will be nice to pass the zoneid as an option to |
also affecting theturtle32 websocket package. |
@MasterJames if the problem is still present in currently supported versions of Node.js, please open a relevant issue at https://github.com/nodejs/node/. This repo is only for archiving purposes and no issues should be reported here. Thanks! |
Yes thanks @aqrln just linking it in now. |
I think this can be closed in favour of nodejs/node#12410, let me know if that's wrong. |
In version 0.12.0,
url.parse()
does not parse a URL containing an IPv6 link-local address correctly.Specifically, adding a zone to the address in the host portion of the URL causes
url.parse()
to fail to detect the host portion of the address and considers everything after the protocol to be part of the path.Example of failure with IPv6 zone:
Example of success without the IPv6 zone:
Expected behavior
I would expect
url.parse()
ofws://[fe80::2e0:4cff:fe68:23ff%en7]:51877/engine.io/?EIO=3&transport=websocket
to produce the following result:The text was updated successfully, but these errors were encountered: