-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the SV of RegExpUnicodeEscapeSequence
is used but not defined
#1861
Comments
The intention was to preserve the errors for (non-RegExp) |
That runs contrary to the linked
(because |
It boils down to https://tc39.es/ecma262/#sec-identifier-names-static-semantics-early-errors, which operates on a single var 𐊧; // U+102A7 CARIAN LETTER A2
// → valid
var \u{102A7}; // U+102A7 CARIAN LETTER A2
// → valid
var \uD800\uDEA7; // U+102A7 represented as a surrogate pair
// → invalid When Dan said:
…it seems these errors fall under that. |
(FYI: I don’t feel strongly on this at all; just explaining my interpretation. I don’t believe the choice we make here really matters in practice.) |
Note that to maintain symmetry between 'x'.match(/(?<𝒜>.)/u).groups.𝒜;
// → does not throw
'x'.match(/(?<\uD835\uDC9C>.)/u).groups.𝒜;
// → throws
'x'.match(/(?<𝒜>.)/u).groups.\uD835\uDC9C;
// → throws
var \uD835\uDC9C;
// → throws …which IMHO makes sense. |
I agree that would be more consistent, though personally I'm kind of inclined to just go with what Chrome and Safari already implement. |
SV(RegExpUnicodeEscapeSequence)
is used but not definedthe SV of RegExpUnicodeEscapeSequence
is used but not defined
Sorry for my delay on this issue! I'm a bit backed up on GitHub notifications. Please reach out to me by email, IRC or Twitter DM if I'm unresponsive here. Oof, looking back, maybe we should've just kept things simple, as @mathiasbynens and @schuay argued. It's not like we were able to reuse the Identifier grammar. Well, now that we're here, I really wasn't thinking through whether we'd allow these split surrogate pair escapes, and I don't have much of an opinion on it. The CharacterValue feels like a "natural" choice (so we don't make multiple notions within RegExps of interpreting characters/escapes), and I'm happy to go with the majority opinion. |
In general, it seems bad to expose the unfortunate concept of surrogates in more places. |
…up names (#1869) This commit makes the Early Errors for RegExpIdentifierStart and RegExpIdentifierPart fully specified, with the semantics that Unicode escape sequences of the form `\u LeadSurrogate \u TrailSurrogate` as well as \u { CodePoint }` are legal in named capture group names for both Unicode and non-Unicode regular expressions. This commit thus makes legal all of the following: - `/(?<\ud835\udc9c>.)/` - `/(?<\ud835\udc9c>.)/u` - `/(?<\u{1d49c}>.)/` - `/(?<\u{1d49c}>.)/u` - `/(?<𝒜>)/` - `/(?<𝒜>)/u` Fixes #1861
In the Early Errors for RegExp literals there are two references to
the SV of RegExpUnicodeEscapeSequence
. Presumably this is intended to be a reference to11.8.4.2 Static Semantics: SV
, but that section only givesSV
s for parts of string literals, whichRegExpUnicodeEscapeSequence
is not. I assume this language was copied from the similar errors for identifier literals, which have non-regexpUnicodeEscapeSequence
s (which are parts of strings literals and so do haveSV
s defined).As I understand it, the intent of those errors is to make, for example,
/(?<\u{1d49c}>.)/u
legal, while giving an early error for/(?<\u{1f602}>.)/u
. (Those code points are𝒜
and😂
respectively; the relevant fact is that the first of these isID Start
, while the second is not.) So an adequate fix would be to useCharacterValue
instead, which is defined for parts of RegExp literals rather than for parts of string literals.There is a slight complication here, which is that unicode RegExp literals allow you to express non-BMP code points as pairs of
\u
escapes, which identifier literals do not: so/(?<\ud835\udc9c>.)/u
matches the grammar. I am pretty sure the intent is that this should be legal as well (which usingCharacterValue
in place of ofSV
would accomplish); that's what major engines (other than XS) which have implemented named capture groups do:eshost -e '"x".match(/(?<\ud835\udc9c>.)/u).groups["𝒜"]'
gives
cc @littledan @mathiasbynens to confirm that the intended behavior is for
/(?<\ud835\udc9c>.)/u
to be legal.Edit: actually, you also have to worry about non-unicode regexen. Safari and Chrome both treat
/(?<\ud835\udc9c>.)/
as a syntax error despite allowing/(?<\ud835\udc9c>.)/u
.The text was updated successfully, but these errors were encountered: