Add regexp/unicode-property
rule
#722
Merged
+1,639
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #720
This PR adds a new rule that allows users to enforce the naming of Unicode properties. It has 3 main features:
gc=
/General_Category=
keys, e.g.\p{gc=L}
->\p{L}
. These prefixes are unnecessary, because the values of theGeneral_Category
property can be accessed without the key.General_Category
/gc
,Script
/sc
, andScript_Extensions
/scx
.\p{L}
->\p{Letter}
and\p{Hex}
->\p{Hex_Digit}
.All of these feature can be individually configured and turned off by the user. The
regexp/unicode-property
is not included in ourrecommended
config, because this rule only enforces a specific style.Default configuration
The default configuration is the following:
This means that, by default, the rule will (1) remove
General_Category
/gc
keys (e.g.\p{gc=L}
->\p{L}
) and (2) enforce long names for values of theScript
andScript_Extensions
properties (e.g.\p{sc=Kana}
->\p{sc=Katakana}
).I chose a minimal configuration because I didn't want to make the rule generate a lot of error for people trying to adapt the rule. I think the 2 effects work well in any code base, no matter what style they usually prefer. (1) simply removes an unnecessary prefix to "simplify" the regex, and (2) prevents the use of the (IMO) horrible aliases for scripts.
Unicode data
Since I needed the data for the mapping between aliases to implement this rule, I had to make the choice between taking a dependency (e.g.
@unicode/unicode-15.0.0
) or including the relevant data in the source files of this project.I chose against adding a dependency, because it was easy enough to get the data I needed and because most of
@unicode/unicode-15.0.0
would be dead weight to us.However, the data I included is used through an API (the
AliasMap
class), so we can easily switch to using a dependency without needing to change theregexp/unicode-property
rule.