-
Notifications
You must be signed in to change notification settings - Fork 66
proposal: WHATWG URL standard implementation #28
Conversation
@nodejs/collaborators |
Perhaps we could raise interest in this by providing examples of failing tests. |
@seishun ... https://github.com/nodejs/node/blob/master/test/known_issues/test-url-parse-conformance.js :-) We currently fail somewhere around 140+ of the test cases in the WhatWG set. |
+1 Considering that the browser are exposing the global I think it makes a whole bunch of sense. I'd be interested in helping out with this. Have you broken ground on implementation @jasnell? Can we borrow from implementations at all? |
```js | ||
const url = new URL('http://user:pass@example.org:1234/p/a/t/h?xyz=abc#hash'); | ||
|
||
console.log(url.protocol; // http: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing ')'
@thealphanerd ... yeah, I've got it mostly implemented already. The next step is to start running it through it's paces with tests and benchmarks and to find ways of optimizing the implementation. It's currently quite a bit slower than the existing |
Regarding borrowing from other impls, it's entirely possible that we could borrow from chromes implementation. I'm not sure yet if theirs is a pure JS impl or not. I'll look into that. |
addition of a single `require('url').URL` export that can be used an an | ||
alternative way of the getting a reference to the new `URL` constructor. | ||
|
||
Existing locations within core that currently accept a URL string or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to use the lower case "url" to stand for the url string to keep consistence with IDL source in https://url.spec.whatwg.org/
@jasnell How about the following 2 static methods which are defined at standard IDL:
I didn't see those in the proposal :-( |
Still considering those. They are easy enough to implement given the
|
There will also be other differences. For instance, I'm not sure if we need
|
There is https://github.com/jsdom/whatwg-url cc @domenic |
Yeah, I'm familiar with (and use) |
Yep, that's @Sebmaster's most excellent work. It's not super optimized, but would be a good starting point. This thread is very exciting and I'm glad there is an appetite for the idea!! The idea of a global, the same as in browsers, is great. I think there are two separable problems here:
There are tests of the URL Standard at https://github.com/w3c/web-platform-tests/tree/master/url, and whatwg-url has a runner. The coverage is pretty reasonable; see web-platform-tests/wpt#3018 |
Oh, that'd be very cool! I guess my only concern is that we weren't concerned with speed when writing it, so there is probably lots of low-hanging fruit for performance improvements. You'd also need to do a bit of work to decouple it from webidl-conversions and webidl2js. If you |
Ok, I'll dig in and explore the whatwg-url internals and see what can be done reasonably. Before getting too deep into this I'd definitely like to get more +1's from collaborators tho. I'd really like to see this happen tho. |
Definitely agree that it's worth separating these. Not sure about modifying the existing |
Well, you can have a 👍 from me! More browser/server unity would be great. 😺 |
BTW, I like the idea of the global, since that's what browsers do, but I'd recommend not exposing it in |
The only concern I would have with that, @Qard, is that if there is existing code that does |
This is not an argument for or against, but a request for more background. You've described what work you would like to do, and how you would do it — but I don't see much in the way of "why" we'd want to do this. What is not working about the status quo that we will rectify by adding a new global object? What value do Node users gain, concretely, from the URL object? |
Maybe store it on |
@chrisdickinson personally I see quite a bit of benefit in minimizing the delta between node + the browser as far as utility API's like this are concerned. |
@chrisdickinson ... the why is straightforward: Currently Node.js' URL parsing has a number of issues in terms of not following the standardized behavior implemented by browsers. Examples of those failures can be seen in the test case I referenced here. There are also differences in the Node.js provided API that are largely unnecessary. This work would give us an opportunity to not only provide more robust URL parsing, but to provide a unified, non-Node.js specific API. @Qard ... that's certainly a possibility |
FWIW, looks like I mis-remembered the number of failures ;-) ... here's the exact count:
|
|
||
## Description | ||
|
||
The WhatWG URL Standard specifies updated syntax, parsing and serialization of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/WhatWG/WHATWG/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acknowledged. Will fix that in the next update.
EPS proposal updated. /cc @nodejs/ctc |
APIs. | ||
|
||
Initially, the implementation would be introduced as an undocumented | ||
experimental feature exposed via a new `URL` property in the `url` module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a Url
property that points to the old constructor/parser — this is likely to cause confusion. If we go down the route of supporting a new URL parser, we may as well expose it as a global. Code written for the browser expects it to be a global, we may as well avoid artificially increasing the delta between browser JS and server JS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for it being a global.
On Jul 2, 2016 10:18 AM, "Chris Dickinson" notifications@github.com wrote:
In XXX-url.md
#28 (comment):
+## Description
+
+The WHATWG URL Standard specifies updated syntax, parsing and serialization of
+URLs as currently implemented by the main Web Browsers. The existing Node.js
+url
module parsing and serialization implementation currently does not support
+the URL standard and fails to pass 160 of the WHATWG URL parsing tests.
+
+This proposal is to implement the WHATWG URL Standard by modifying the existing
+url
module to provide an implementation of theURL
object and associated
+APIs. Doing so improves the robustness of URL parsing, provides consistency
+with browser js code, and can eventually allow the reduction of node.js specific
+APIs.
+
+Initially, the implementation would be introduced as an undocumented
+experimental feature exposed via a newURL
property in theurl
module.There's already a Url property that points to the old constructor/parser
— this is likely to cause confusion. If we go down the route of
supporting a new URL parser, we may as well expose it as a global. Code
written for the browser expects it to be a global, we may as well avoid
artificially increasing the delta between browser JS and server JS.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/node-eps/pull/28/files/a0a892210e9fa20f684247dd58e838a9cd333df1#r69380482,
or mute the thread
https://github.com/notifications/unsubscribe/AAa2ecg3l3xtEecYClWoO40LgnxvIjUEks5qRp1pgaJpZM4IrtaC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also fine with that. Already have encodeURI()
, etc.
Putting this on the ctc-agenda as I believe it's ready for discussion/review. |
Does npm rely on the parsing of urls with support for the |
There's a certain loss of functionality by using Taking the following excerpt from the WHATWG URL specification:
It seems that
Even though Taking from RFC3986 we can see a simple generic way of parsing the URL that I think could be used at minimum to parse unfamiliar schemes:
Not to be rude, but check your own work first. Chrome has added an exception to
Additionally,
All this said, we should support other/unknown schemes properly with |
It's already been clarified upthread that I was incorrect about special schemes. There are no special schemes besides file: |
@domenic oop. sorry about that. completely glanced over that comment. |
Adding git as a special scheme would be trivial if necessary. If necessary On Jul 13, 2016 10:13 PM, "Trevor Norris" notifications@github.com wrote:
|
After further testing, git URLs are handled by this implementation without any problems whatsoever using the implementation in #28. |
The CTC discussed this today and has decided to advance this to DRAFT status with the exception of the |
By the way, after the core implemented the WHATWG URL standard, I think also should start deprecating the |
@yorkie ... eventually, once the new URL object is no longer experimental, my goal is to replace use of the existing url.parse within the http module entirely. That's not going to be easy tho. It's going to take a long time. |
Landed in f16f770 |
The WHATWG URL Standard specifies updated syntax,
parsing and serialization of URLs as currently implemented by the main Web Browsers.
The existing Node.js
url
module parsing and serialization implementation currentlydoes not support the URL standard and fails to pass about 160 of the standard tests.
This proposal is to implement the WHATWG URL Standard by introducing a new
URL
class off theurl
module (e.g.require('url').URL
).The existing
url
module would remain unchanged and there should be nobackwards compatibility concerns.
Example