diff --git a/spec.html b/spec.html index a2a6421033..eda2b1516d 100644 --- a/spec.html +++ b/spec.html @@ -10684,9 +10684,9 @@

Syntax

ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence `\\u000A`, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED (LF)) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence `\\u000A` occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write `\\n` instead of `\\u000A` to cause a LINE FEED (LF) to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

- -

Static Semantics: UTF16Encoding ( _cp_ )

-

The abstract operation UTF16Encoding takes argument _cp_ (a numeric code point value). It performs the following steps when called:

+ +

Static Semantics: CodePointToUTF16CodeUnits ( _cp_ )

+

The abstract operation CodePointToUTF16CodeUnits takes argument _cp_ (a numeric code point value). It performs the following steps when called:

1. Assert: 0 ≤ _cp_ ≤ 0x10FFFF. 1. If _cp_ ≤ 0xFFFF, return _cp_. @@ -10696,17 +10696,20 @@

Static Semantics: UTF16Encoding ( _cp_ )

- -

Static Semantics: UTF16Encode ( _text_ )

-

The abstract operation UTF16Encode takes argument _text_ (a sequence of Unicode code points). It converts _text_ into a String value, as described in . It performs the following steps when called:

+ +

Static Semantics: CodePointsToString ( _text_ )

+

The abstract operation CodePointsToString takes argument _text_ (a sequence of Unicode code points). It converts _text_ into a String value, as described in . It performs the following steps when called:

- 1. Return the string-concatenation of the code units that are the UTF16Encoding of each code point in _text_, in order. + 1. Let _result_ be the empty String. + 1. For each code point _cp_ in _text_, do + 1. Set _result_ to the string-concatenation of _result_ and ! CodePointToUTF16CodeUnits(_cp_). + 1. Return _result_.
- -

Static Semantics: UTF16DecodeSurrogatePair ( _lead_, _trail_ )

-

The abstract operation UTF16DecodeSurrogatePair takes arguments _lead_ (a code unit) and _trail_ (a code unit). Two code units that form a UTF-16 are converted to a code point. It performs the following steps when called:

+ +

Static Semantics: UTF16SurrogatePairToCodePoint ( _lead_, _trail_ )

+

The abstract operation UTF16SurrogatePairToCodePoint takes arguments _lead_ (a code unit) and _trail_ (a code unit). Two code units that form a UTF-16 are converted to a code point. It performs the following steps when called:

1. Assert: _lead_ is a and _trail_ is a . 1. Let _cp_ be (_lead_ - 0xD800) × 0x400 + (_trail_ - 0xDC00) + 0x10000. @@ -10729,14 +10732,14 @@

Static Semantics: CodePointAt ( _string_, _position_ )

1. Let _second_ be the code unit at index _position_ + 1 within _string_. 1. If _second_ is not a , then 1. Return the Record { [[CodePoint]]: _cp_, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: *true* }. - 1. Set _cp_ to ! UTF16DecodeSurrogatePair(_first_, _second_). + 1. Set _cp_ to ! UTF16SurrogatePairToCodePoint(_first_, _second_). 1. Return the Record { [[CodePoint]]: _cp_, [[CodeUnitCount]]: 2, [[IsUnpairedSurrogate]]: *false* }.
- -

Static Semantics: UTF16DecodeString ( _string_ )

-

The abstract operation UTF16DecodeString takes argument _string_ (a String). It returns the sequence of Unicode code points that results from interpreting _string_ as UTF-16 encoded Unicode text as described in . It performs the following steps when called:

+ +

Static Semantics: StringToCodePoints ( _string_ )

+

The abstract operation StringToCodePoints takes argument _string_ (a String). It returns the sequence of Unicode code points that results from interpreting _string_ as UTF-16 encoded Unicode text as described in . It performs the following steps when called:

1. Let _codePoints_ be a new empty List. 1. Let _size_ be the length of _string_. @@ -11240,13 +11243,13 @@

Static Semantics: Early Errors

IdentifierStart :: `\` UnicodeEscapeSequence
  • - It is a Syntax Error if the SV of |UnicodeEscapeSequence| is none of *"$"*, or *"_"*, or the UTF16Encoding of a code point matched by the |UnicodeIDStart| lexical grammar production. + It is a Syntax Error if the SV of |UnicodeEscapeSequence| is none of *"$"*, or *"_"*, or ! CodePointToUTF16CodeUnits(_cp_) for some Unicode code point _cp_ matched by the |UnicodeIDStart| lexical grammar production.
IdentifierPart :: `\` UnicodeEscapeSequence
  • - It is a Syntax Error if the SV of |UnicodeEscapeSequence| is none of *"$"*, or *"_"*, or the UTF16Encoding of either <ZWNJ> or <ZWJ>, or the UTF16Encoding of a Unicode code point that would be matched by the |UnicodeIDContinue| lexical grammar production. + It is a Syntax Error if the SV of |UnicodeEscapeSequence| is none of *"$"*, *"_"*, ! CodePointToUTF16CodeUnits(<ZWNJ>), ! CodePointToUTF16CodeUnits(<ZWJ>), or ! CodePointToUTF16CodeUnits(_cp_) for some Unicode code point _cp_ that would be matched by the |UnicodeIDContinue| lexical grammar production.
@@ -11262,7 +11265,7 @@

Static Semantics: StringValue

1. Let _idText_ be the source text matched by |IdentifierName|. 1. Let _idTextUnescaped_ be the result of replacing any occurrences of `\\` |UnicodeEscapeSequence| in _idText_ with the code point represented by the |UnicodeEscapeSequence|. - 1. Return ! UTF16Encode(_idTextUnescaped_). + 1. Return ! CodePointsToString(_idTextUnescaped_).
@@ -11672,7 +11675,7 @@

Static Semantics: NumericValue

String Literals

-

A string literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded as defined in . Code points belonging to the Basic Multilingual Plane are encoded as a single code unit element of the string. All other code points are encoded as two code unit elements of the string.

+

A string literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded as defined in . Code points belonging to the Basic Multilingual Plane are encoded as a single code unit element of the string. All other code points are encoded as two code unit elements of the string.

Syntax

@@ -11778,7 +11781,7 @@

Static Semantics: SV

The SV of SingleStringCharacters :: SingleStringCharacter SingleStringCharacters is a sequence of up to two code units that is the SV of |SingleStringCharacter| followed by the code units of the SV of |SingleStringCharacters| in order.
  • - The SV of DoubleStringCharacter :: SourceCharacter but not one of `"` or `\` or LineTerminator is the UTF16Encoding of the code point value of |SourceCharacter|. + The SV of DoubleStringCharacter :: SourceCharacter but not one of `"` or `\` or LineTerminator is the result of performing CodePointToUTF16CodeUnits on the code point value of |SourceCharacter|.
  • The SV of DoubleStringCharacter :: <LS> is the code unit 0x2028 (LINE SEPARATOR). @@ -11790,7 +11793,7 @@

    Static Semantics: SV

    The SV of DoubleStringCharacter :: LineContinuation is the empty code unit sequence.
  • - The SV of SingleStringCharacter :: SourceCharacter but not one of `'` or `\` or LineTerminator is the UTF16Encoding of the code point value of |SourceCharacter|. + The SV of SingleStringCharacter :: SourceCharacter but not one of `'` or `\` or LineTerminator is the result of performing CodePointToUTF16CodeUnits on the code point value of |SourceCharacter|.
  • The SV of SingleStringCharacter :: <LS> is the code unit 0x2028 (LINE SEPARATOR). @@ -11956,7 +11959,7 @@

    Static Semantics: SV

    • - The SV of NonEscapeCharacter :: SourceCharacter but not one of EscapeCharacter or LineTerminator is the UTF16Encoding of the code point value of |SourceCharacter|. + The SV of NonEscapeCharacter :: SourceCharacter but not one of EscapeCharacter or LineTerminator is the result of performing CodePointToUTF16CodeUnits on the code point value of |SourceCharacter|.
    • The SV of HexEscapeSequence :: `x` HexDigit HexDigit is the code unit whose value is (16 times the MV of the first |HexDigit|) plus the MV of the second |HexDigit|. @@ -11965,7 +11968,7 @@

      Static Semantics: SV

      The SV of Hex4Digits :: HexDigit HexDigit HexDigit HexDigit is the code unit whose value is the MV of |Hex4Digits|.
    • - The SV of UnicodeEscapeSequence :: `u{` CodePoint `}` is the UTF16Encoding of the MV of |CodePoint|. + The SV of UnicodeEscapeSequence :: `u{` CodePoint `}` is the result of performing CodePointToUTF16CodeUnits on the MV of |CodePoint|.
    @@ -12131,7 +12134,7 @@

    Static Semantics: TV and TRV

    The TV of TemplateCharacters :: TemplateCharacter TemplateCharacters is *undefined* if either the TV of |TemplateCharacter| is *undefined* or the TV of |TemplateCharacters| is *undefined*. Otherwise, it is a sequence consisting of the code units of the TV of |TemplateCharacter| followed by the code units of the TV of |TemplateCharacters|.
  • - The TV of TemplateCharacter :: SourceCharacter but not one of ``` or `\` or `$` or LineTerminator is the UTF16Encoding of the code point value of |SourceCharacter|. + The TV of TemplateCharacter :: SourceCharacter but not one of ``` or `\` or `$` or LineTerminator is the result of performing CodePointToUTF16CodeUnits on the code point value of |SourceCharacter|.
  • The TV of TemplateCharacter :: `$` is the code unit 0x0024 (DOLLAR SIGN). @@ -12152,7 +12155,7 @@

    Static Semantics: TV and TRV

    The TRV of TemplateCharacters :: TemplateCharacter TemplateCharacters is a sequence consisting of the code units of the TRV of |TemplateCharacter| followed by the code units of the TRV of |TemplateCharacters|.
  • - The TRV of TemplateCharacter :: SourceCharacter but not one of ``` or `\` or `$` or LineTerminator is the UTF16Encoding of the code point value of |SourceCharacter|. + The TRV of TemplateCharacter :: SourceCharacter but not one of ``` or `\` or `$` or LineTerminator is the result of performing CodePointToUTF16CodeUnits on the code point value of |SourceCharacter|.
  • The TRV of TemplateCharacter :: `$` is the code unit 0x0024 (DOLLAR SIGN). @@ -12197,13 +12200,13 @@

    Static Semantics: TV and TRV

    The TRV of NotEscapeSequence :: `u` `{` CodePoint [lookahead <! HexDigit] [lookahead != `}`] is the sequence consisting of the code unit 0x0075 (LATIN SMALL LETTER U) followed by the code unit 0x007B (LEFT CURLY BRACKET) followed by the code units of the TRV of |CodePoint|.
  • - The TRV of DecimalDigit :: one of `0` `1` `2` `3` `4` `5` `6` `7` `8` `9` is the UTF16Encoding of the single code point matched by this production. + The TRV of DecimalDigit :: one of `0` `1` `2` `3` `4` `5` `6` `7` `8` `9` is the result of performing CodePointToUTF16CodeUnits on the single code point matched by this production.
  • The TRV of CharacterEscapeSequence :: NonEscapeCharacter is the SV of |NonEscapeCharacter|.
  • - The TRV of SingleEscapeCharacter :: one of `'` `"` `\` `b` `f` `n` `r` `t` `v` is the UTF16Encoding of the single code point matched by this production. + The TRV of SingleEscapeCharacter :: one of `'` `"` `\` `b` `f` `n` `r` `t` `v` is the result of performing CodePointToUTF16CodeUnits on the single code point matched by this production.
  • The TRV of HexEscapeSequence :: `x` HexDigit HexDigit is the sequence consisting of the code unit 0x0078 (LATIN SMALL LETTER X) followed by TRV of the first |HexDigit| followed by the TRV of the second |HexDigit|. @@ -12221,7 +12224,7 @@

    Static Semantics: TV and TRV

    The TRV of HexDigits :: HexDigits HexDigit is the sequence consisting of TRV of |HexDigits| followed by TRV of |HexDigit|.
  • - The TRV of HexDigit :: one of `0` `1` `2` `3` `4` `5` `6` `7` `8` `9` `a` `b` `c` `d` `e` `f` `A` `B` `C` `D` `E` `F` is the UTF16Encoding of the single code point matched by this production. + The TRV of HexDigit :: one of `0` `1` `2` `3` `4` `5` `6` `7` `8` `9` `a` `b` `c` `d` `e` `f` `A` `B` `C` `D` `E` `F` is the result of performing CodePointToUTF16CodeUnits on the single code point matched by this production.
  • The TRV of LineContinuation :: `\` LineTerminatorSequence is the sequence consisting of the code unit 0x005C (REVERSE SOLIDUS) followed by the code units of TRV of |LineTerminatorSequence|. @@ -13266,7 +13269,7 @@

    Static Semantics: IsValidRegularExpressionLiteral ( _literal_ )

    1. Let _patternText_ be BodyText of _literal_. 1. If FlagText of _literal_ contains `u`, let _u_ be *true*; else let _u_ be *false*. 1. If _u_ is *false*, then - 1. Let _stringValue_ be UTF16Encode(_patternText_). + 1. Let _stringValue_ be CodePointsToString(_patternText_). 1. Set _patternText_ to the sequence of code points resulting from interpreting each of the 16-bit elements of _stringValue_ as a Unicode BMP code point. UTF-16 decoding is not applied to the elements. 1. Let _parseResult_ be ParsePattern(_patternText_, _u_). 1. If _parseResult_ is a Parse Node, return *true*; else return *false*. @@ -13277,8 +13280,8 @@

    Static Semantics: IsValidRegularExpressionLiteral ( _literal_ )

    Runtime Semantics: Evaluation

    PrimaryExpression : RegularExpressionLiteral - 1. Let _pattern_ be ! UTF16Encode(BodyText of |RegularExpressionLiteral|). - 1. Let _flags_ be ! UTF16Encode(FlagText of |RegularExpressionLiteral|). + 1. Let _pattern_ be ! CodePointsToString(BodyText of |RegularExpressionLiteral|). + 1. Let _flags_ be ! CodePointsToString(FlagText of |RegularExpressionLiteral|). 1. Return RegExpCreate(_pattern_, _flags_). @@ -25258,7 +25261,7 @@

    Runtime Semantics: PerformEval ( _x_, _callerRealm_, _strictCaller_, _direct 1. Set _inMethod_ to _thisEnvRec_.HasSuperBinding(). 1. If _F_.[[ConstructorKind]] is ~derived~, set _inDerivedConstructor_ to *true*. 1. Perform the following substeps in an implementation-defined order, possibly interleaving parsing and error detection: - 1. Let _script_ be the ECMAScript code that is the result of parsing ! UTF16DecodeString(_x_), for the goal symbol |Script|. If the parse fails, throw a *SyntaxError* exception. If any early errors are detected, throw a *SyntaxError* exception (but see also clause ). + 1. Let _script_ be the ECMAScript code that is the result of parsing ! StringToCodePoints(_x_), for the goal symbol |Script|. If the parse fails, throw a *SyntaxError* exception. If any early errors are detected, throw a *SyntaxError* exception (but see also clause ). 1. If _script_ Contains |ScriptBody| is *false*, return *undefined*. 1. Let _body_ be the |ScriptBody| of _script_. 1. If _inFunction_ is *false*, and _body_ Contains |NewTarget|, throw a *SyntaxError* exception. @@ -25594,7 +25597,7 @@

    Runtime Semantics: Decode ( _string_, _reservedSet_ )

    1. Set _j_ to _j_ + 1. 1. If _Octets_ does not contain a valid UTF-8 encoding of a Unicode code point, throw a *URIError* exception. 1. Let _V_ be the value obtained by applying the UTF-8 transformation to _Octets_, that is, from a List of octets into a 21-bit value. - 1. Let _S_ be the String value whose code units are, in order, the elements in UTF16Encoding(_V_). + 1. Let _S_ be the String value whose code units are, in order, the elements in CodePointToUTF16CodeUnits(_V_). 1. Set _R_ to the string-concatenation of the previous value of _R_ and _S_. 1. Set _k_ to _k_ + 1. @@ -26575,10 +26578,10 @@

    Runtime Semantics: CreateDynamicFunction ( _constructor_, _newTarget_, _kind 1. Let _bodyString_ be the string-concatenation of 0x000A (LINE FEED), ? ToString(_bodyArg_), and 0x000A (LINE FEED). 1. Let _prefix_ be the prefix associated with _kind_ in . 1. Let _sourceString_ be the string-concatenation of _prefix_, *" anonymous("*, _P_, 0x000A (LINE FEED), *") {"*, _bodyString_, and *"}"*. - 1. Let _sourceText_ be ! UTF16DecodeString(_sourceString_). + 1. Let _sourceText_ be ! StringToCodePoints(_sourceString_). 1. Perform the following substeps in an implementation-defined order, possibly interleaving parsing and error detection: - 1. Let _parameters_ be the result of parsing ! UTF16DecodeString(_P_), using _parameterGoal_ as the goal symbol. Throw a *SyntaxError* exception if the parse fails. - 1. Let _body_ be the result of parsing ! UTF16DecodeString(_bodyString_), using _goal_ as the goal symbol. Throw a *SyntaxError* exception if the parse fails. + 1. Let _parameters_ be the result of parsing ! StringToCodePoints(_P_), using _parameterGoal_ as the goal symbol. Throw a *SyntaxError* exception if the parse fails. + 1. Let _body_ be the result of parsing ! StringToCodePoints(_bodyString_), using _goal_ as the goal symbol. Throw a *SyntaxError* exception if the parse fails. 1. Let _strict_ be ContainsUseStrict of _body_. 1. If any static semantics errors are detected for _parameters_ or _body_, throw a *SyntaxError* exception. If _strict_ is *true*, the Early Error rules for UniqueFormalParameters : FormalParameters are applied. 1. If _strict_ is *true* and IsSimpleParameterList of _parameters_ is *false*, throw a *SyntaxError* exception. @@ -26747,7 +26750,7 @@

    Function.prototype.toString ( )

    1. Let _func_ be the *this* value. 1. If _func_ is a built-in function object, then return an implementation-defined String source code representation of _func_. The representation must have the syntax of a |NativeFunction|. Additionally, if _func_ has an [[InitialName]] internal slot and _func_.[[InitialName]] is a String, the portion of the returned String that would be matched by |NativeFunctionAccessor?| |PropertyName| must be the value of _func_.[[InitialName]]. 1. If Type(_func_) is Object and _func_ has a [[SourceText]] internal slot and _func_.[[SourceText]] is a sequence of Unicode code points and ! HostHasSourceTextAvailable(_func_) is *true*, then - 1. Return ! UTF16Encode(_func_.[[SourceText]]). + 1. Return ! CodePointsToString(_func_.[[SourceText]]). 1. If Type(_func_) is Object and IsCallable(_func_) is *true*, then return an implementation-defined String source code representation of _func_. The representation must have the syntax of a |NativeFunction|. 1. Throw a *TypeError* exception. @@ -30253,7 +30256,7 @@

    String.fromCodePoint ( ..._codePoints_ )

    1. Let _nextCP_ be ? ToNumber(_next_). 1. If ! IsInteger(_nextCP_) is *false*, throw a *RangeError* exception. 1. If _nextCP_ < 0 or _nextCP_ > 0x10FFFF, throw a *RangeError* exception. - 1. Append the elements of the UTF16Encoding of _nextCP_ to the end of _elements_. + 1. Append the elements of ! CodePointToUTF16CodeUnits(_nextCP_) to the end of _elements_. 1. Set _nextIndex_ to _nextIndex_ + 1. 1. Return the String value whose code units are, in order, the elements in the List _elements_. If _length_ is 0, the empty String is returned. @@ -31057,9 +31060,9 @@

    String.prototype.toLowerCase ( )

    1. Let _O_ be ? RequireObjectCoercible(*this* value). 1. Let _S_ be ? ToString(_O_). - 1. Let _sText_ be ! UTF16DecodeString(_S_). + 1. Let _sText_ be ! StringToCodePoints(_S_). 1. Let _lowerText_ be the result of toLowercase(_sText_), according to the Unicode Default Case Conversion algorithm. - 1. Let _L_ be ! UTF16Encode(_lowerText_). + 1. Let _L_ be ! CodePointsToString(_lowerText_). 1. Return _L_.

    The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also all locale-insensitive mappings in the SpecialCasings.txt file that accompanies it).

    @@ -31538,7 +31541,7 @@

    Static Semantics: Early Errors

    RegExpIdentifierStart[U] :: UnicodeLeadSurrogate UnicodeTrailSurrogate
    • - It is a Syntax Error if the result of performing UTF16DecodeSurrogatePair on the two code points matched by |UnicodeLeadSurrogate| and |UnicodeTrailSurrogate| respectively is not matched by the |UnicodeIDStart| lexical grammar production. + It is a Syntax Error if the result of performing UTF16SurrogatePairToCodePoint on the two code points matched by |UnicodeLeadSurrogate| and |UnicodeTrailSurrogate| respectively is not matched by the |UnicodeIDStart| lexical grammar production.
    RegExpIdentifierPart[U] :: `\` RegExpUnicodeEscapeSequence[+U] @@ -31550,7 +31553,7 @@

    Static Semantics: Early Errors

    RegExpIdentifierPart[U] :: UnicodeLeadSurrogate UnicodeTrailSurrogate
    • - It is a Syntax Error if the result of performing UTF16DecodeSurrogatePair on the two code points matched by |UnicodeLeadSurrogate| and |UnicodeTrailSurrogate| respectively is not matched by the |UnicodeIDContinue| lexical grammar production. + It is a Syntax Error if the result of performing UTF16SurrogatePairToCodePoint on the two code points matched by |UnicodeLeadSurrogate| and |UnicodeTrailSurrogate| respectively is not matched by the |UnicodeIDContinue| lexical grammar production.
    UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue @@ -31766,7 +31769,7 @@

    Static Semantics: CharacterValue

    1. Let _lead_ be the CharacterValue of |HexLeadSurrogate|. 1. Let _trail_ be the CharacterValue of |HexTrailSurrogate|. - 1. Let _cp_ be UTF16DecodeSurrogatePair(_lead_, _trail_). + 1. Let _cp_ be UTF16SurrogatePairToCodePoint(_lead_, _trail_). 1. Return the code point value of _cp_. RegExpUnicodeEscapeSequence :: `u` Hex4Digits @@ -31817,7 +31820,7 @@

    Static Semantics: StringValue

    1. Let _idText_ be the source text matched by |RegExpIdentifierName|. 1. Let _idTextUnescaped_ be the result of replacing any occurrences of `\\` |RegExpUnicodeEscapeSequence| in _idText_ with the code point represented by the |RegExpUnicodeEscapeSequence|. - 1. Return ! UTF16Encode(_idTextUnescaped_). + 1. Return ! CodePointsToString(_idTextUnescaped_). @@ -31829,7 +31832,7 @@

    Pattern Semantics

    The syntax and semantics of |Pattern| is defined as if the source code for the |Pattern| was a List of |SourceCharacter| values where each |SourceCharacter| corresponds to a Unicode code point. If a BMP pattern contains a non-BMP |SourceCharacter| the entire pattern is encoded using UTF-16 and the individual code units of that encoding are used as the elements of the List.

    For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character) List consisting of the single code point 0x1D11E. However, interpreted as a BMP pattern, it is first UTF-16 encoded to produce a two element List consisting of the code units 0xD834 and 0xDD1E.

    -

    Patterns are passed to the RegExp constructor as ECMAScript String values in which non-BMP characters are UTF-16 encoded. For example, the single character MUSICAL SYMBOL G CLEF pattern, expressed as a String value, is a String of length 2 whose elements were the code units 0xD834 and 0xDD1E. So no further translation of the string would be necessary to process it as a BMP pattern consisting of two pattern characters. However, to process it as a Unicode pattern UTF16DecodeSurrogatePair must be used in producing a List consisting of a single pattern character, the code point U+1D11E.

    +

    Patterns are passed to the RegExp constructor as ECMAScript String values in which non-BMP characters are UTF-16 encoded. For example, the single character MUSICAL SYMBOL G CLEF pattern, expressed as a String value, is a String of length 2 whose elements were the code units 0xD834 and 0xDD1E. So no further translation of the string would be necessary to process it as a BMP pattern consisting of two pattern characters. However, to process it as a Unicode pattern UTF16SurrogatePairToCodePoint must be used in producing a List consisting of a single pattern character, the code point U+1D11E.

    An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.

    @@ -31887,7 +31890,7 @@

    Pattern

    1. Return a new Abstract Closure with parameters (_str_, _index_) that captures _m_ and performs the following steps when called: 1. Assert: Type(_str_) is String. 1. Assert: ! IsNonNegativeInteger(_index_) is *true* and _index_ ≤ the length of _str_. - 1. If _Unicode_ is *true*, let _Input_ be a List consisting of the sequence of code points of ! UTF16DecodeString(_str_). Otherwise, let _Input_ be a List consisting of the sequence of code units that are the elements of _str_. _Input_ will be used throughout the algorithms in . Each element of _Input_ is considered to be a character. + 1. If _Unicode_ is *true*, let _Input_ be a List consisting of the sequence of code points of ! StringToCodePoints(_str_). Otherwise, let _Input_ be a List consisting of the sequence of code units that are the elements of _str_. _Input_ will be used throughout the algorithms in . Each element of _Input_ is considered to be a character. 1. Let _InputLength_ be the number of characters contained in _Input_. This alias will be used throughout the algorithms in . 1. Let _listIndex_ be the index into _Input_ of the character that was obtained from element _index_ of _str_. 1. Let _c_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called: @@ -32979,7 +32982,7 @@

    Runtime Semantics: RegExpInitialize ( _obj_, _pattern_, _flags_ )

    1. If _F_ contains any code unit other than *"g"*, *"i"*, *"m"*, *"s"*, *"u"*, or *"y"* or if it contains the same code unit more than once, throw a *SyntaxError* exception. 1. If _F_ contains *"u"*, let _u_ be *true*; else let _u_ be *false*. 1. If _u_ is *true*, then - 1. Let _patternText_ be ! UTF16DecodeString(_P_). + 1. Let _patternText_ be ! StringToCodePoints(_P_). 1. Let _patternCharacters_ be a List whose elements are the code points of _patternText_. 1. Else, 1. Let _patternText_ be the result of interpreting each of _P_'s 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements. @@ -33157,7 +33160,7 @@

    Runtime Semantics: RegExpBuiltinExec ( _R_, _S_ )

    1. If _captureI_ is *undefined*, let _capturedValue_ be *undefined*. 1. Else if _fullUnicode_ is *true*, then 1. Assert: _captureI_ is a List of code points. - 1. Let _capturedValue_ be ! UTF16Encode(_captureI_). + 1. Let _capturedValue_ be ! CodePointsToString(_captureI_). 1. Else, 1. Assert: _fullUnicode_ is *false*. 1. Assert: _captureI_ is a List of code units. @@ -38621,9 +38624,9 @@

    JSON.parse ( _text_ [ , _reviver_ ] )

    The optional _reviver_ parameter is a function that takes two parameters, _key_ and _value_. It can filter and transform the results. It is called with each of the _key_/_value_ pairs produced by the parse, and its return value is used instead of the original value. If it returns what it received, the structure is not modified. If it returns *undefined* then the property is deleted from the result.

    1. Let _jsonString_ be ? ToString(_text_). - 1. [id="step-json-parse-validate"] Parse ! UTF16DecodeString(_jsonString_) as a JSON text as specified in ECMA-404. Throw a *SyntaxError* exception if it is not a valid JSON text as defined in that specification. + 1. [id="step-json-parse-validate"] Parse ! StringToCodePoints(_jsonString_) as a JSON text as specified in ECMA-404. Throw a *SyntaxError* exception if it is not a valid JSON text as defined in that specification. 1. Let _scriptString_ be the string-concatenation of *"("*, _jsonString_, and *");"*. - 1. [id="step-json-parse-parse"] Let _completion_ be the result of parsing and evaluating ! UTF16DecodeString(_scriptString_) as if it was the source text of an ECMAScript |Script|. The extended PropertyDefinitionEvaluation semantics defined in must not be used during the evaluation. + 1. [id="step-json-parse-parse"] Let _completion_ be the result of parsing and evaluating ! StringToCodePoints(_scriptString_) as if it was the source text of an ECMAScript |Script|. The extended PropertyDefinitionEvaluation semantics defined in must not be used during the evaluation. 1. Let _unfiltered_ be _completion_.[[Value]]. 1. [id="step-json-parse-assert-type"] Assert: _unfiltered_ is either a String, Number, Boolean, Null, or an Object that is defined by either an |ArrayLiteral| or an |ObjectLiteral|. 1. If IsCallable(_reviver_) is *true*, then @@ -38802,14 +38805,14 @@

    Runtime Semantics: QuoteJSONString ( _value_ )

    The abstract operation QuoteJSONString takes argument _value_. It wraps _value_ in 0x0022 (QUOTATION MARK) code units and escapes certain other code units within it. This operation interprets _value_ as a sequence of UTF-16 encoded code points, as described in . It performs the following steps when called:

    1. Let _product_ be the String value consisting solely of the code unit 0x0022 (QUOTATION MARK). - 1. For each code point _C_ in ! UTF16DecodeString(_value_), do + 1. For each code point _C_ in ! StringToCodePoints(_value_), do 1. If _C_ is listed in the “Code Point” column of , then 1. Set _product_ to the string-concatenation of _product_ and the escape sequence for _C_ as specified in the “Escape Sequence” column of the corresponding row. 1. Else if _C_ has a numeric value less than 0x0020 (SPACE), or if _C_ has the same numeric value as a or , then 1. Let _unit_ be the code unit whose numeric value is that of _C_. 1. Set _product_ to the string-concatenation of _product_ and UnicodeEscape(_unit_). 1. Else, - 1. Set _product_ to the string-concatenation of _product_ and the UTF16Encoding of _C_. + 1. Set _product_ to the string-concatenation of _product_ and ! CodePointToUTF16CodeUnits(_C_). 1. Set _product_ to the string-concatenation of _product_ and the code unit 0x0022 (QUOTATION MARK). 1. Return _product_.