-
Notifications
You must be signed in to change notification settings - Fork 47.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the implementation of escapeTextContentForBrowser with escape-html #6862
Conversation
…pe-html for performance
'<div title="'"<>&" style="text-align:'"<>&;">' + | ||
''"<>&' + | ||
'<div title="'"<>&" style="text-align:'"<>&;">' + | ||
''"<>&' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escape-html
uses '
instead of '
, which should be a completely equivalent HTML entity for single quote (and it's one byte less, too).
|
From previous benchmarking, pre-testing if there are any matches before doing replace would provide a measurable performance increase, but this was discarded in-favor of simplicity for the time being (presumed not to really matter). With that minor tweak I imagine the difference would then be negligible. I'm also suspecting your test-cases are heavily skewed towards values that need escaping or I fail to see how it could have such a massive impact, so in practice it shouldn't result in a significant difference anyway I would assume. Also, there's a not-super-obvious twist here, PS. I also suspect changing to |
@zpao Thanks for the response!
I'm curious; is this referring to things internal to Facebook or internal to React? I tried to cover what's needed in React to include the dependency, and it at least seems to work from tests and inspection of the output build. @syranide Thanks for responding!
Totally understood; premature optimization is uncool.
I think (though I'm not 100% sure) that short circuiting for numbers and booleans also helps significantly. This is true in all test cases because
I understand your suspicion, but it's not the case. I ran a test where the leaf nodes both did and did not need escaping, and while the escaping tests showed a larger perf boost, every case was better with this code. The numbers were:
Yeah, I did some exploratory testing to see if an implementation that didn't check for <> would be faster for attribute values. It didn't seem like there was any real difference, so I stuck with a function that would work for both attribute values and text content.
I might be misreading you here, but I don't understand why using
Hmmmm... good point! I'm curious; did this happen when Overall, I totally take your point that the vast majority of real-world inputs to the function will not need to be escaped, so the majority of the perf increase could be achieved with a combo of type-check short-circuit and pretesting for the escapable characters. (In fact, I wrote and almost submitted that version, but then I decided to also test perf when encoding was needed, which led me to If you would rather have that version, which trades off runtime perf of the unlikely cases for simplicity of dependency management, just let me know; I'd be happy to submit it! |
@aickin Not at all, it just seems odd to switch to an external algorithm if there's reason to believe we may want to switch back in the future. Nitpick. I was just putting all the arguments out there really, not trying to shoot it down. :) Since it seems you've done a bit of benchmarking already, could you do a test with: function escapeTextContentForBrowser(text) {
text = ('' + text);
if (ESCAPE_REGEX.test(text)) {
return text.replace(ESCAPE_REGEX, escaper);
}
return text;
} That's the optimization I was referring to, I suspect that should close the gap considerably. Possibly try with the shortcircuits for boolean and number as well? PS. This is not my decision, I'm not sure what preferences the others have when it comes to external dependencies. |
That's almost character-for-character what I was about to submit before I moved to So, I went ahead and ran the benchmark I've been using on four scenarios (linked to the commit if you want to look at code):
For test cases that don't need any escaping, pretest, typeof & pretest, and escape-html are all about the same. For test cases that do need escaping, escape-html is the clear winner. I'm happy to do whatever. I'd probably recommend typeof & pretest or escape-html, depending on whether we care about the escaping case. Thoughts? |
Sorry, wasn't clear. I mean internal to Facebook. Since our current sync process is "copy files from src/" we need require to be resolvable. We'll just need to have a module internally that is It looks like
This is one of those areas where "semver" gets kind of vague & icky. There are many who would say that changing the output of that (when not fixing a bug) requires a major version change. I don't really feel that way but I can understand the argument. We changed reactid in a major version so it was safe from scrutiny. |
Oh, I was just assuming this would go in a major version. I asked the question because I thought that @syranide was objecting that the change might cause churn in over-literal unit tests, even in a major version, and I was wondering if that churn in fact happened when |
FWIW, looking at a macro level CPU profile (bottom-up view), for handling a complete server-side request, I'm seeing Happy to try testing something differently and try to reproduce your results. |
@ericf Huh, that's definitely odd. I just checked in my test setup, and the test functions can be found here. It's not the most elegant code, but it produces reproducible results. You run it with I used Node v5.7.1 with react@master vs. the various branches in my repo, which I tested by changing the package.json to point to them. If I may, I'd like to ask a few questions about your setup: What were you rendering? How many nodes did it have, and how big a document did it produce? And what tools did you use to profile the render? Thanks! |
@aickin here's some more details on my tests:
I'm using Chrome Dev Tools to
|
EDIT: I ran your tests in the @aickin I ran your tests with my setup and captured a CPU profile of the complete run which you should be able to open (try Chrome Canary):
Results from running your tests with my Node setup:
In both dev and prod CPU profiles, I'm not seeing |
Thanks for working on this! (And sorry about the branch confusion...)
The tests always use a minified version of React to get the best perf, so you aren't going to see For what it's worth, you're seeing very, very similar times to what I'm seeing on my 2.8GHz mid 2014 MBP. |
@ericf: I spent some more poking around with the production env CPU profile that you uploaded, and I have a theory for what's going on. (And apologies in advance for the length of this post... 😊) First off, I went through react.min.js from
However, there are a few unexplained quirks here. First, Together those two methods take 3,202.4 ms, or 18.4% of the 17,389.7 ms This was really curious, because createMarkupForID: function(id) {
return DOMProperty.ID_ATTRIBUTE_NAME + '=' +
quoteAttributeValueForBrowser(id);
}, How could that be taking up 875ms of self time? Literally all it's doing is looking up an object property and two string concats! Then I realized: inlining. v8 uses a sampling profiler, and it can't report inlined functions on the call stack, because they aren't actually on the call stack. As a result, To test this theory, I wrote a page with the following code and profiled it in Chrome: function match(matcher, text) {
if (matcher.test(text)) {
text.replace(matcher, function() { return "b"; });
}
}
function run() {
var text = "asfasdfasdfasdfasdfasdfasdfadsfasdfasdf";
var matcher = /c/;
for (var i = 0; i < 1000000; i++) {
match(matcher, text);
}
} This is the profile I got: Note that function match(matcher, text) {
try {
if (matcher.test(text)) {
text.replace(matcher, function() { return "b"; });
}
} catch (e) {
console.log(e);
}
} which gave the following profile: Now that the function isn't inlined, we can see that the vast majority of the time is spent in So, I think this explains how you and I could get such a different view of how much time Does this make sense? Do you agree that this is what is happening here? And thanks so much for your work on this; it's been unexpectedly fun to dive into profiling! |
@aickin yeah that makes sense to explain the differences. So it's likely that |
Can you try running this on the benchmark in the scripts/bench folder? |
@spicyj sure! I did two things to
$ ./measure.py react-master.min.js master.txt react-use-escape-html.min.js use-escape-html.txt
Measuring SSR for PE benchmark (30 trials)
______________________________
..............................
Measuring SSR for PE with warm JIT (30 slow trials)
______________________________
..............................
$ ./analyze.py master.txt use-escape-html.txt
Comparing master.txt (control) vs use-escape-html.txt (test)
Significant differences marked by ***
% change from control to test, with 99% CIs:
* factory_ms_jsc_jit
% change: -0.34% [ -3.36%, +2.68%]
means: 16.3161 (control), 16.2621 (test)
* factory_ms_jsc_nojit
% change: +0.92% [ -1.94%, +3.78%]
means: 15.1536 (control), 15.2938 (test)
* factory_ms_node
% change: +0.91% [ -1.11%, +2.94%]
means: 48.6843 (control), 49.1311 (test)
* ssr_pe_cold_ms_jsc_jit
% change: -1.34% [ -3.70%, +1.02%]
means: 44.5324 (control), 43.9364 (test)
* ssr_pe_cold_ms_jsc_nojit
% change: -0.43% [ -2.96%, +2.11%]
means: 49.4186 (control), 49.2102 (test)
* ssr_pe_cold_ms_node
% change: -0.50% [ -2.77%, +1.78%]
means: 86.9631 (control), 86.5367 (test)
* ssr_pe_warm_ms_jsc_jit
% change: -5.28% [ -6.67%, -3.89%] ***
means: 8.7489 (control), 8.287 (test)
* ssr_pe_warm_ms_jsc_nojit
% change: +1.03% [ -0.47%, +2.52%]
means: 24.5058 (control), 24.7579 (test)
* ssr_pe_warm_ms_node
% change: -4.78% [ -6.51%, -3.05%] ***
means: 25.3768 (control), 24.1645 (test) Looks to me like there's significant differences in the warm-JIT cases, but not in any of the other cases, which is pretty much what I would expect. It doesn't quite see the scale of improvement that my tests did, but I'm guessing that's because of particularities of the test cases. |
How about we copy escape-html here but change it to output the same as our current version? See src/renderers/dom/client/utils/isEventSupported.js for an example of how to do the license. |
Follow-up: let's use this exact header:
|
…. Pulled the code of escape-html in to react and changed the encoding of single quote to '.
@@ -32,7 +108,16 @@ function escaper(match) { | |||
* @return {string} An escaped string. | |||
*/ | |||
function escapeTextContentForBrowser(text) { | |||
return ('' + text).replace(ESCAPE_REGEX, escaper); | |||
switch (typeof text) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, can we inline if (typeof text === 'boolean' || typeof text === 'number')
? V8, at least, inlines typeof foo === <string>
in the bytecode.
…ore inlinable for v8. Thanks, @spicyj.
@aickin updated the pull request. |
1 similar comment
@aickin updated the pull request. |
Thanks! |
…-html (facebook#6862) * Replacing the implementation of escapeTextContentForBrowser with escape-html for performance * Addressing @spicyj's code review comment here: facebook#6862 (comment) . Pulled the code of escape-html in to react and changed the encoding of single quote to '. * Addressing code review comment facebook#6862 (comment) to make code more inlinable for v8. Thanks, @spicyj. (cherry picked from commit d6e7058)
…-html (#6862) * Replacing the implementation of escapeTextContentForBrowser with escape-html for performance * Addressing @spicyj's code review comment here: #6862 (comment) . Pulled the code of escape-html in to react and changed the encoding of single quote to '. * Addressing code review comment #6862 (comment) to make code more inlinable for v8. Thanks, @spicyj. (cherry picked from commit d6e7058)
While working on #6836, I found that
escapeTextContentForBrowser
can take up a significant amount of time in server rendering because it gets called for every string child and every attribute value.By replacing the current implementation with
escape-html
, my server rendering test cases see a reduction of 8-20% in server rendering time, depending on how many attributes there are per element and whether or not the values have special characters.I don't know what the react policy is on dependencies, but it looks like there are other MIT-licensed dependencies already, so I thought this might be possible.