Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: improve url.parse() performance #4892

Closed
wants to merge 1 commit into from

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Jan 27, 2016

This commit improves url.parse() performance by 50-200% with the existing url/url-parse benchmarks. Also, the optimizations made in url.format() result in a 40% increase in performance for url.resolve().

Some optimization strategies used in this commit include:

  • Combining multiple searches on the same string into a
    single loop
  • Avoiding unnecessary string.split() and array.join()
  • Minimizing creation of temporary strings
  • Using a faster alternative to encodeURIComponent,
    borrowed from the querystring module

@mscdex mscdex added the url Issues and PRs related to the legacy built-in url module. label Jan 27, 2016
@mscdex
Copy link
Contributor Author

mscdex commented Jan 27, 2016

FWIW here is the output from the url-parse benchmark:

url/url-parse.js type=one n=1000000: ./node: 159510 ./node-prev: 88061 .... 81.14%
url/url-parse.js type=two n=1000000: ./node: 185240 ./node-prev: 107190 ... 72.82%
url/url-parse.js type=three n=1000000: ./node: 152400 ./node-prev: 101710 . 49.84%
url/url-parse.js type=four n=1000000: ./node: 685780 ./node-prev: 422830 .. 62.19%
url/url-parse.js type=five n=1000000: ./node: 687790 ./node-prev: 223260 . 208.07%
url/url-parse.js type=six n=1000000: ./node: 171340 ./node-prev: 93834 .... 82.60%

@mscdex
Copy link
Contributor Author

mscdex commented Jan 27, 2016

@silverwind
Copy link
Contributor

Is this related to #2303?

@mscdex
Copy link
Contributor Author

mscdex commented Jan 27, 2016

@silverwind I hadn't seen that PR before this. However, trying that other PR (just url.js actually) locally against master causes test failures with the existing test/parallel/test-url.js.

This PR passes the existing tests and diverges less from the original code (FWIW).

@mscdex
Copy link
Contributor Author

mscdex commented Jan 27, 2016

I made the encodeURIComponent() alternative implementation a bit more efficient now. New CI: https://ci.nodejs.org/job/node-test-pull-request/1399/

@jasnell
Copy link
Member

jasnell commented Jan 27, 2016

LGTM if CI is green

@jasnell
Copy link
Member

jasnell commented Jan 27, 2016

Marking this as lts-watch but I definitely think we need to wait to land until after it goes out in a stable for a few weeks.

@mscdex mscdex force-pushed the perf-url-parse branch 3 times, most recently from 1f2ac71 to c492001 Compare January 28, 2016 18:06
@Fishrock123
Copy link
Contributor

uh... cc @domenic maybe?

@mscdex
Copy link
Contributor Author

mscdex commented Jan 31, 2016

Hrmm, somehow the CI job got lost. Here it is again: https://ci.nodejs.org/job/node-test-pull-request/1463/

@jasnell
Copy link
Member

jasnell commented Feb 1, 2016

CI looks good, but might be worth getting a citgm run on this. /cc @thealphanerd

let srcPath = result.pathname && result.pathname.split('/') || [];
const relPath = relative.pathname && relative.pathname.split('/') || [];
var srcPath = result.pathname && result.pathname.split('/') || [];
var relPath = relative.pathname && relative.pathname.split('/') || [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Did var turn out to be faster than the scoped variants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolveObject() changes were mostly unintentional as the url module was modified while I was working on this. However, the reverting of let is intentional as let is still slow in v8. I've left the const as-is though.

This commit improves url.parse() performance by 50-210%
with the existing url/url-parse benchmarks. Also, the
optimizations made in url.format() result in a 40%
increase in performance for url.resolve().

Some optimization strategies used in this commit include:
* Combining multiple searches on the same string into a
   single loop
* Avoiding unnecessary string.split() and array.join()
* Minimizing creation of temporary strings
* Using a faster alternative to encodeURIComponent,
   borrowed from the querystring module
@MylesBorins
Copy link
Contributor

citgm: https://ci.nodejs.org/job/thealphanerd-smoker/62/

note: run with this patched node https://github.com/TheAlphaNerd/node/tree/improve-url-parse
diff from master: 4897f94...bdef972

@mscdex
Copy link
Contributor Author

mscdex commented Feb 7, 2016

Looking at the citgm results the only non-flaky error result is the david module which looks like an unrelated test error. In that particular case the standard module version david is checking for needs to be updated because it's asserting that the standard module is still pre-6.0 when 6.0.0 was release just less than a day ago.

@ChALkeR ChALkeR added the performance Issues and PRs related to the performance of Node.js. label Feb 16, 2016
@mscdex
Copy link
Contributor Author

mscdex commented Feb 17, 2016

/cc @nodejs/collaborators Any other comments/LGTMs for this?

As far as the other url performance PRs go, I'm not opposed to them if they improve performance even more than this PR. However this PR already passes all tests and is more conservative since it preserves the original url module behavior and is more in line with the original code/design.

@rmg
Copy link
Contributor

rmg commented Feb 17, 2016

LGTM, but I haven't looked at any of the other PRs.

@jbergstroem
Copy link
Member

I'm LGTM as well -- I get the impression that the other PR needs more work/review; hopefully will land at a later stage.

@silverwind
Copy link
Contributor

In retrospect, we should have landed #2303 much sooner, it's just that we got lost there with those WhatWG tests. Really sorry about that, @STRML.

@mcollina
Copy link
Member

So, I propose we get this in asap, and possibly backport to LTS.
And we aim for #2303 for node 6 or 7.

@rvagg
Copy link
Member

rvagg commented Feb 18, 2016

These major perf rewrites are not great candidates for LTS cause of their higher risk of finding edge-cases that we won't otherwise find. We improve the chances with letting them bake in v5 for a while but it doesn't make the risk disappear completely.

As it stands I'm -1 on this for v4 unless someone wants to make a case that it's not as major as it looks.

mscdex added a commit that referenced this pull request Feb 18, 2016
This commit improves url.parse() performance by 50-210%
with the existing url/url-parse benchmarks. Also, the
optimizations made in url.format() result in a 40%
increase in performance for url.resolve().

Some optimization strategies used in this commit include:
* Combining multiple searches on the same string into a
   single loop
* Avoiding unnecessary string.split() and array.join()
* Minimizing creation of temporary strings
* Using a faster alternative to encodeURIComponent,
   borrowed from the querystring module

PR-URL: #4892
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ryan Graham <r.m.graham@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
@mscdex
Copy link
Contributor Author

mscdex commented Feb 18, 2016

Landed in db91281.

@mscdex mscdex closed this Feb 18, 2016
@mscdex mscdex deleted the perf-url-parse branch February 18, 2016 13:05
@jbergstroem
Copy link
Member

I'm with @rvagg here. I think in general we could be too frisky with backports -- I prefer the more anal "bugfixes only" to avoid corner cases.

@petkaantonov
Copy link
Contributor

performance is a feature

@STRML
Copy link
Contributor

STRML commented Feb 18, 2016

👍 Never know what could be broken months down the line. Glad to see this merged regardless for 5+.

@Trott Trott mentioned this pull request Feb 18, 2016
@Trott
Copy link
Member

Trott commented Feb 18, 2016

It looks like this was landed three weeks after its last CI run happened. In the intervening time, new lint rules were added. So now make jslint (and CI!) fails on master. It looks like this can be patched up pretty easily/quickly, but I'd want @mscdex to do it if at all possible just in case there are performance implications.

@mscdex
Copy link
Contributor Author

mscdex commented Feb 18, 2016

@Trott #5300

rvagg pushed a commit that referenced this pull request Feb 21, 2016
This commit improves url.parse() performance by 50-210%
with the existing url/url-parse benchmarks. Also, the
optimizations made in url.format() result in a 40%
increase in performance for url.resolve().

Some optimization strategies used in this commit include:
* Combining multiple searches on the same string into a
   single loop
* Avoiding unnecessary string.split() and array.join()
* Minimizing creation of temporary strings
* Using a faster alternative to encodeURIComponent,
   borrowed from the querystring module

PR-URL: #4892
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ryan Graham <r.m.graham@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
@geek
Copy link
Member

geek commented Feb 23, 2016

Related issue:
#5393

@MylesBorins
Copy link
Contributor

@nodejs/lts this commit introduced some regressions, there are fixes coming in or already in I believe.

Is this something we will want to back port if we have the fixes and it has been stable for a bit?

Part of me want to just say don't land, and avoid potential regressions... but this is some pretty major churn.

@Fishrock123
Copy link
Contributor

@thealphanerd I would delay a while. Same with my timers thing.

@jasnell
Copy link
Member

jasnell commented Mar 2, 2016

@thealphanerd ... I'm going to say hold off for now. There are too many regressions on this. Even with the fixes coming this should sit for a while in master and v5 before we pull it back to v4.

@jasnell
Copy link
Member

jasnell commented Mar 2, 2016

In fact, let's take the lts-watch label off and put the don't land in v4 label on. We can revisit later once we're sure things are stable and the regressions are handled.

@rvagg
Copy link
Member

rvagg commented Mar 3, 2016

I'm a strong -1 on putting this or similar performance commits on to LTS, for simple and clearly low-risk commits we can consider on a case-by-case basis but these introduce too much variability into LTS which we are committed to keeping stable. It's important we achieve and maintain trust with users and give them zero reason to fear upgrading from release to release.

@jasnell
Copy link
Member

jasnell commented Mar 3, 2016

Agree. While I'd love to get performance improvements into LTS in general if possible, these kinds of changes are far too invasive, I think, and open too many possible risks for LTS. Unfortunately, however, not landing them could make cherrypicking more difficult later but we knew that going in.

@MylesBorins
Copy link
Contributor

If this does come it needs to land with #5300

@benjamingr
Copy link
Member

Just noting #5300 has been landed.

@mscdex
Copy link
Contributor Author

mscdex commented Mar 23, 2016

@benjamingr I think @thealphanerd meant if it landed on v4.x.

@benjamingr
Copy link
Member

Oh, I saw that the lts-watch has been removed, reading it again it looks like it isn't landing on 4.x either. Sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js. url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.