Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify RegExp Modifiers static semantics #3439

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 74 additions & 4 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -35659,7 +35659,15 @@ <h2>Syntax</h2>
`\` AtomEscape[?UnicodeMode, ?NamedCaptureGroups]
CharacterClass[?UnicodeMode, ?UnicodeSetsMode]
`(` GroupSpecifier[?UnicodeMode]? Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
`(?:` Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
`(?` RegularExpressionModifiers `:` Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
`(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`

RegularExpressionModifiers ::
[empty]
RegularExpressionModifiers RegularExpressionModifier

RegularExpressionModifier :: one of
`i` `m` `s`

SyntaxCharacter :: one of
`^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `]` `{` `}` `|`
Expand Down Expand Up @@ -35904,6 +35912,27 @@ <h1>Static Semantics: Early Errors</h1>
It is a Syntax Error if the MV of the first |DecimalDigits| is strictly greater than the MV of the second |DecimalDigits|.
</li>
</ul>
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
<ul>
<li>
It is a Syntax Error if the source text matched by |RegularExpressionModifiers| contains the same code point more than once.
</li>
</ul>
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
<ul>
<li>
It is a Syntax Error if the source text matched by the second |RegularExpressionModifiers| is empty.
</li>
<li>
It is a Syntax Error if the source text matched by the first |RegularExpressionModifiers| contains the same code point more than once.
</li>
<li>
It is a Syntax Error if the source text matched by the second |RegularExpressionModifiers| contains the same code point more than once.
</li>
<li>
It is a Syntax Error if any code point in the source text matched by the first |RegularExpressionModifiers| is also contained in the source text matched by the second |RegularExpressionModifiers|.
</li>
</ul>
<emu-grammar>AtomEscape :: `k` GroupName</emu-grammar>
<ul>
<li>
Expand Down Expand Up @@ -37088,9 +37117,19 @@ <h1>
<emu-note>
<p>Parentheses of the form `(` |Disjunction| `)` serve both to group the components of the |Disjunction| pattern together and to save the result of the match. The result can be used either in a backreference (`\\` followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching Abstract Closure. To inhibit the capturing behaviour of parentheses, use the form `(?:` |Disjunction| `)` instead.</p>
</emu-note>
<emu-grammar>Atom :: `(?:` Disjunction `)`</emu-grammar>
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
<emu-alg>
1. Return CompileSubpattern of |Disjunction| with arguments _rer_ and _direction_.
1. Let _addModifiers_ be the source text matched by |RegularExpressionModifiers|.
1. Let _removeModifiers_ be the empty String.
1. Let _modifiedRer_ be UpdateModifiers(_rer_, CodePointsToString(_addModifiers_), _removeModifiers_).
1. Return CompileSubpattern of |Disjunction| with arguments _modifiedRer_ and _direction_.
</emu-alg>
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
<emu-alg>
1. Let _addModifiers_ be the source text matched by the first |RegularExpressionModifiers|.
1. Let _removeModifiers_ be the source text matched by the second |RegularExpressionModifiers|.
1. Let _modifiedRer_ be UpdateModifiers(_rer_, CodePointsToString(_addModifiers_), CodePointsToString(_removeModifiers_)).
1. Return CompileSubpattern of |Disjunction| with arguments _modifiedRer_ and _direction_.
</emu-alg>

<!-- AtomEscape -->
Expand Down Expand Up @@ -37238,6 +37277,34 @@ <h1>
<p>In case-insignificant matches when HasEitherUnicodeFlag(_rer_) is *false*, the mapping is based on Unicode Default Case Conversion algorithm toUppercase rather than toCasefold, which results in some subtle differences. For example, `Ω` (U+2126 OHM SIGN) is mapped by toUppercase to itself but by toCasefold to `ω` (U+03C9 GREEK SMALL LETTER OMEGA) along with `Ω` (U+03A9 GREEK CAPITAL LETTER OMEGA), so *"\u2126"* is matched by `/[ω]/ui` and `/[\u03A9]/ui` but not by `/[ω]/i` or `/[\u03A9]/i`. Also, no code point outside the Basic Latin block is mapped to a code point within it, so strings such as *"\u017F ſ"* and *"\u212A K"* are not matched by `/[a-z]/i`.</p>
</emu-note>
</emu-clause>

<emu-clause id="sec-updatemodifiers" type="abstract operation">
<h1>
UpdateModifiers (
_rer_: a RegExp Record,
_add_: a String,
_remove_: a String,
): a RegExp Record
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Assert: _add_ and _remove_ have no elements in common.
1. Let _ignoreCase_ be _rer_.[[IgnoreCase]].
1. Let _multiline_ be _rer_.[[Multiline]].
1. Let _dotAll_ be _rer_.[[DotAll]].
1. Let _unicode_ be _rer_.[[Unicode]].
1. Let _unicodeSets_ be _rer_.[[UnicodeSets]].
1. Let _capturingGroupsCount_ be _rer_.[[CapturingGroupsCount]].
1. If _remove_ contains *"i"*, set _ignoreCase_ to *false*.
1. Else if _add_ contains *"i"*, set _ignoreCase_ to *true*.
1. If _remove_ contains *"m"*, set _multiline_ to *false*.
1. Else if _add_ contains *"m"*, set _multiline_ to *true*.
1. If _remove_ contains *"s"*, set _dotAll_ to *false*.
1. Else if _add_ contains *"s"*, set _dotAll_ to *true*.
1. Return the RegExp Record { [[IgnoreCase]]: _ignoreCase_, [[Multiline]]: _multiline_, [[DotAll]]: _dotAll_, [[Unicode]]: _unicode_, [[UnicodeSets]]: _unicodeSets_, [[CapturingGroupsCount]]: _capturingGroupsCount_ }.
</emu-alg>
</emu-clause>
</emu-clause>

<emu-clause id="sec-compilecharacterclass" type="sdo" oldids="sec-characterclass">
Expand Down Expand Up @@ -50234,6 +50301,8 @@ <h1>Regular Expressions</h1>
<emu-prodref name="Quantifier"></emu-prodref>
<emu-prodref name="QuantifierPrefix"></emu-prodref>
<emu-prodref name="Atom"></emu-prodref>
<emu-prodref name="RegularExpressionModifiers"></emu-prodref>
<emu-prodref name="RegularExpressionModifier"></emu-prodref>
<emu-prodref name="SyntaxCharacter"></emu-prodref>
<emu-prodref name="PatternCharacter"></emu-prodref>
<emu-prodref name="AtomEscape"></emu-prodref>
Expand Down Expand Up @@ -50397,7 +50466,8 @@ <h2>Syntax</h2>
`\` [lookahead == `c`]
CharacterClass[~UnicodeMode, ~UnicodeSetsMode]
`(` GroupSpecifier[~UnicodeMode]? Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
`(?:` Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
`(?` RegularExpressionModifiers `:` Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
`(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
InvalidBracedQuantifier
ExtendedPatternCharacter

Expand Down
Loading