Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Add RegExp Modifiers #3221

Merged
merged 1 commit into from
Jan 9, 2025
Merged

Conversation

rbuckton
Copy link
Contributor

This adds the specification text for the Stage 3 RegExp Modifiers proposal.

Test262 tests can be found at tc39/test262#3960.

@ljharb ljharb added normative change Affects behavior required to correctly evaluate some ECMAScript source text pending stage 4 This proposal has not yet achieved stage 4, but may otherwise be ready to merge. has test262 tests proposal This is related to a specific proposal, and will be closed/merged when the proposal reaches stage 4. labels Nov 16, 2023
@ljharb ljharb marked this pull request as draft November 16, 2023 00:28
spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
Copy link
Member

@michaelficarra michaelficarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we use the RegularExpressionFlags production and restrict it with an early error instead of introducing a new, more restricted production?

@rbuckton
Copy link
Contributor Author

Is there a reason we use the RegularExpressionFlags production and restrict it with an early error instead of introducing a new, more restricted production?

Is there a reason not to? Modifiers are a strict subset of RegularExpressionFlags, and RegularExpressionFlags itself is not restricted in the grammar, only via semantics. If I were to create a RegularExpressionModifiers production, it would have the same definition as RegularExpressionFlags.

@michaelficarra
Copy link
Member

I believe RegularExpressionFlags is not restricted in the grammar because RegularExpressionLiteral needs to not change when we add new flags. This is due to the overlap with division. We don't have that same constraint with the modifiers grammar though, so it should be able to be done with grammar restrictions and not early errors.

@rbuckton
Copy link
Contributor Author

I believe RegularExpressionFlags is not restricted in the grammar because RegularExpressionLiteral needs to not change when we add new flags. This is due to the overlap with division. We don't have that same constraint with the modifiers grammar though, so it should be able to be done with grammar restrictions and not early errors.

How would you propose it be written so as not to require early errors? Modifiers can appear in any order, cannot be duplicated within one or both of the modifier locations, and I do plan to extend the set of allowed modifiers over time with things like x-mode, so it needs to be fairly flexible. Even if we limit it to a subset of characters like i, m, and s for now, we either need an early error for duplicates, or a complex grammar like:

RegularExpressionModifiers:
  RegularExpressionModifierChars[+IgnoreCase, +Multiline, +DotAll]

RegularExpressionModifierChars[IgnoreCase, Multiline, DotAll]:
  [empty]
  [+IgnoreCase] `i` RegularExpressionModifierChars[~IgnoreCase, ?Multiline, ?DotAll]?
  [+Multiline] `m` RegularExpressionModifierChars[?IgnoreCase, ~Multiline, ?DotAll]?
  [+DotAll] `s` RegularExpressionModifierChars[?IgnoreCase, ?Multiline, ~DotAll]?

The more modifiers we add, the more production parameters we need, and the more complex the production becomes. Plus, this doesn't avoid the need for an early error to forbid (?i-i:).

@michaelficarra
Copy link
Member

@rbuckton I'm not saying we can't use any early errors at all on the production, just that we don't need to use early errors to enforce the allowed flag characters when an alternative grammar could do the job. I'm happy to continue using early errors to check for duplicates.

@rbuckton
Copy link
Contributor Author

Now that both V8 and SpiderMonkey are shipping regex modifiers, I'm hoping to request Stage 4 at the October plenary. As such I'm working on updating this PR so that it is in a mergeable state.

@michaelficarra do you still want me to refactor modifiers to something other than RegularExpressionFlags?

@rbuckton rbuckton marked this pull request as ready for review September 26, 2024 21:42
@rbuckton
Copy link
Contributor Author

@michaelficarra I went ahead and added a RegularExpressionModifiers production that just restricts the allowed characters to ims, and continues to use an early error to check for duplicates.

spec.html Outdated
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `-` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
<ul>
<li>
It is a Syntax Error if the source text matched by the first |RegularExpressionModifiers| and the source text matched by the second |RegularExpressionModifiers| are both empty.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just throw the early error when the second |RegularExpressionModifiers| is empty regardless of whether the first modifiers are empty? For example, currently the spec allows /(?i-:foo)/, but the implementation differs: V8 accepts this production while Babel rejects it.

Copy link
Contributor Author

@rbuckton rbuckton Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly? We only error here because we decided to be more pedantic about modifiers than other languages. C#, for example, seems to parse (?-:) just fine. Any change here would need consensus.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered that I was wondering the same (why the start constant is placed here) when I was reviewing my RegExp check implementation for TypeScript.
I believe that Huáng’s expectation is actually more ergonomic, otherwise it would not be so easy for the current behavior to be misimplemented.

@bakkot bakkot added has stage 4 This PR represents a proposal that has achieved stage 4, and is ready to merge. and removed pending stage 4 This proposal has not yet achieved stage 4, but may otherwise be ready to merge. labels Oct 16, 2024
spec.html Outdated
<emu-grammar>Atom :: `(?` RegularExpressionModifiers `:` Disjunction `)`</emu-grammar>
<ul>
<li>
It is a Syntax Error if the source text matched by |RegularExpressionModifiers| contains the same code point more than once.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically we could accomplish this entirely within the grammar, but this is a fine use of early errors.

Copy link
Member

@michaelficarra michaelficarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rbuckton
Copy link
Contributor Author

rbuckton commented Jan 8, 2025

Outside of updating from main, is there anything preventing this from being merged now that it has Stage 4?

@michaelficarra michaelficarra requested a review from syg January 8, 2025 19:45
@michaelficarra
Copy link
Member

@rbuckton It's waiting on editorial review from @syg.

Copy link
Contributor

@syg syg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % question about strings

spec.html Outdated Show resolved Hide resolved
@ljharb ljharb changed the title Add spec text for RegExp Modifiers Normative: Add RegExp Modifiers Jan 8, 2025
@bakkot bakkot added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Jan 8, 2025
@ljharb ljharb merged commit f55b180 into tc39:main Jan 9, 2025
8 checks passed
@rbuckton rbuckton deleted the regexp-modifiers branch January 9, 2025 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has stage 4 This PR represents a proposal that has achieved stage 4, and is ready to merge. has test262 tests normative change Affects behavior required to correctly evaluate some ECMAScript source text proposal This is related to a specific proposal, and will be closed/merged when the proposal reaches stage 4. ready to merge Editors believe this PR needs no further reviews, and is ready to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants