-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Don't rewrite unicode escape sequences to unicode #1188
Comments
It's a breaking change for rust users. My midterm exam ends on tomorrow, so I'll see if it's viable then. cc @bartlomieju @ry @dsherret |
SGTM |
Actually we just had this reported in Deno: denoland/deno#8161 and I was just about to raise an issue about it here. Given the input of: export function getIndex(c: string): number {
return "\x00\r\n\x85\u2028\u2029".indexOf(c);
} We would expect the output to be: export function getIndex(c) {
return "\x00\r\n\x85\u2028\u2029".indexOf(c);
} But we are instead getting (the unicode chars have been encoded to unicode): export function getIndex(c) {
return "\0\r\n�
".indexOf(c);
} |
This closed issue has been automatically locked because it had no new activity for a month. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you. |
Describe the feature
Over at
esbuild
(which is often used in conjunction withswc
to target ES5), the decision was recently made to switch over to producing ASCII-only output by default.The decision was triggered specifically by the discussion in this issue (and originally this one) due to a combination of interoperability and performance concerns.
First, in terms of interoperability, this is in line with other bundlers such as Webpack, Parcel, and Rollup, and also in line with babel that also preserves unicode escape sequences.
Parcel and Webpack generates minified bundles by default, and they use
terser
for that. Terser applies an optimization that transforms unicode escape sequences into utf-8 characters, so you might end out with unicode-characters that weren't there in your source code in your output bundle(s) from these tools - but the default behavior of them before terser is in the picture is to respect whatever was written in the source code.From a performance perspective, however, it get's really interesting. It turns out that many browsers have an optimized scanning path for ASCII code which seems to indicate that parsing an ASCII string is around 1.7x faster in V8 than parsing an UTF-8 string. So there's potential performance gains as well.
The text was updated successfully, but these errors were encountered: