Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regexp/unicode-property rule #722

Merged
merged 4 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/nervous-lies-yawn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"eslint-plugin-regexp": minor
---

Add `regexp/unicode-property` rule to enforce consistent naming of unicode properties
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,7 @@ The `plugin.configs["flat/all"]` / `plugin:regexp/all` config enables all rules.
| [sort-character-class-elements](https://ota-meshi.github.io/eslint-plugin-regexp/rules/sort-character-class-elements.html) | enforces elements order in character class | | | 🔧 | |
| [sort-flags](https://ota-meshi.github.io/eslint-plugin-regexp/rules/sort-flags.html) | require regex flags to be sorted | 🟢 🔵 | | 🔧 | |
| [unicode-escape](https://ota-meshi.github.io/eslint-plugin-regexp/rules/unicode-escape.html) | enforce consistent usage of unicode escape or unicode codepoint escape | | | 🔧 | |
| [unicode-property](https://ota-meshi.github.io/eslint-plugin-regexp/rules/unicode-property.html) | enforce consistent naming of unicode properties | | | 🔧 | |

<!-- end auto-generated rules list -->

Expand Down
1 change: 1 addition & 0 deletions docs/rules/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ sidebarDepth: 0
| [sort-character-class-elements](sort-character-class-elements.md) | enforces elements order in character class | | | 🔧 | |
| [sort-flags](sort-flags.md) | require regex flags to be sorted | 🟢 🔵 | | 🔧 | |
| [unicode-escape](unicode-escape.md) | enforce consistent usage of unicode escape or unicode codepoint escape | | | 🔧 | |
| [unicode-property](unicode-property.md) | enforce consistent naming of unicode properties | | | 🔧 | |

<!-- end auto-generated rules list -->

Expand Down
245 changes: 245 additions & 0 deletions docs/rules/unicode-property.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
---
pageClass: "rule-details"
sidebarDepth: 0
title: "regexp/unicode-property"
description: "enforce consistent naming of unicode properties"
---
# regexp/unicode-property

🔧 This rule is automatically fixable by the [`--fix` CLI option](https://eslint.org/docs/latest/user-guide/command-line-interface#--fix).

<!-- end auto-generated rule header -->

> enforce consistent naming of unicode properties

## :book: Rule Details

This rule helps to enforce consistent style and naming of unicode properties.

There are many ways a single Unicode property can be expressed. E.g. `\p{L}`, `\p{Letter}`, `\p{gc=L}`, `\p{gc=Letter}`, `\p{General_Category=L}`, and `\p{General_Category=Letter}` are all equivalent. This rule can be configured in a variety of ways to control exactly which ones of those variants are allowed. The default configuration is intended to be a good starting point for most users.

<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: "error" */

/* ✓ GOOD */
var re = /\p{L}/u;
var re = /\p{Letter}/u;
var re = /\p{Script=Greek}/u;
var re = /\p{scx=Greek}/u;
var re = /\p{Hex}/u;
var re = /\p{Hex_Digit}/u;

/* ✗ BAD */
var re = /\p{gc=L}/u;
var re = /\p{General_Category=Letter}/u;
var re = /\p{Script=Grek}/u;
```

</eslint-code-block>

## :wrench: Options

```json
{
"regexp/unicode-property": ["error", {
"generalCategory": "never",
"key": "ignore",
"property": {
"binary": "ignore",
"generalCategory": "ignore",
"script": "long",
}
}]
}
```

### `generalCategory: "never" | "always" | "ignore"`

Values from the `General_Category` property can be expressed in two ways: either without or with the `gc=` (or `General_Category=`) prefix. E.g. `\p{Letter}` or `\p{gc=Letter}`.

This option controls whether the `gc=` prefix is required or forbidden.

- `"never"` (default): The `gc=` (or `General_Category=`) prefix is forbidden.
<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { generalCategory: "never" }] */

var re = /\p{Letter}/u;
var re = /\p{gc=Letter}/u;
var re = /\p{General_Category=Letter}/u;
```

</eslint-code-block>

- `"always"`: The `gc=` (or `General_Category=`) prefix is required.
<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { generalCategory: "always" }] */

var re = /\p{Letter}/u;
var re = /\p{gc=Letter}/u;
var re = /\p{General_Category=Letter}/u;
```

</eslint-code-block>

- `"ignore"`: Both with and without prefix is allowed.
<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { generalCategory: "ignore" }] */

var re = /\p{Letter}/u;
var re = /\p{gc=Letter}/u;
var re = /\p{General_Category=Letter}/u;
```

</eslint-code-block>

### `key: "short" | "long" | "ignore"`

Unicode properties in key-value form (e.g. `\p{gc=Letter}`, `\P{scx=Greek}`) have two variants for the key: a short and a long form. E.g. `\p{gc=Letter}` and `\p{General_Category=Letter}`.

This option controls whether the short or long form is required.

- `"short"`: The key must be in short form.
<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { key: "short", generalCategory: "ignore" }] */

var re = /\p{gc=Letter}/u;
var re = /\p{General_Category=Letter}/u;
var re = /\p{sc=Greek}/u;
var re = /\p{Script=Greek}/u;
var re = /\p{scx=Greek}/u;
var re = /\p{Script_Extensions=Greek}/u;
```

</eslint-code-block>

- `"long"`: The key must be in long form.
<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { key: "long", generalCategory: "ignore" }] */

var re = /\p{gc=Letter}/u;
var re = /\p{General_Category=Letter}/u;
var re = /\p{sc=Greek}/u;
var re = /\p{Script=Greek}/u;
var re = /\p{scx=Greek}/u;
var re = /\p{Script_Extensions=Greek}/u;
```

</eslint-code-block>

- `"ignore"` (default): The key can be in either form.
<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { key: "ignore", generalCategory: "ignore" }] */

var re = /\p{gc=Letter}/u;
var re = /\p{General_Category=Letter}/u;
var re = /\p{sc=Greek}/u;
var re = /\p{Script=Greek}/u;
var re = /\p{scx=Greek}/u;
var re = /\p{Script_Extensions=Greek}/u;
```

</eslint-code-block>

### `property: "short" | "long" | "ignore" | object`

Similar to `key`, most property names also have long and short forms. E.g. `\p{Letter}` and `\p{L}`.

This option controls whether the short or long form is required. Which forms is required can be configured for each property type via an object. The object has to be of the type:

```ts
{
binary?: "short" | "long" | "ignore",
generalCategory?: "short" | "long" | "ignore",
script?: "short" | "long" | "ignore",
}
```

- `binary` controls the form of Binary Unicode properties. E.g. `ASCII`, `Any`, `Hex`.
- `generalCategory` controls the form of values from the `General_Category` property. E.g. `Letter`, `Ll`, `P`.
- `script` controls the form of values from the `Script` and `Script_Extensions` properties. E.g. `Greek`.

If the option is set to a string instead of an object, it will be used for all property types.

> NOTE: The `"short"` and `"long"` options follow the [Unicode standard](https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt) for short and long names. However, short names aren't always shorter than long names. E.g. the short name for `p{sc=Han}` is `\p{sc=Hani}`.
>
> There are also some properties that don't have a short name, such as `\p{sc=Thai}`, and some that have additional aliases that can be longer than the long name, such as `\p{Mark}` (long) with its short name `\p{M}` and alias `\p{Combining_Mark}`.

#### Examples

All set to `"long"`:

<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { property: "long" }] */

var re = /\p{Hex}/u;
var re = /\p{Hex_Digit}/u;
var re = /\p{L}/u;
var re = /\p{Letter}/u;
var re = /\p{sc=Grek}/u;
var re = /\p{sc=Greek}/u;
```

</eslint-code-block>

All set to `"short"`:

<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { property: "short" }] */

var re = /\p{Hex}/u;
var re = /\p{Hex_Digit}/u;
var re = /\p{L}/u;
var re = /\p{Letter}/u;
var re = /\p{sc=Grek}/u;
var re = /\p{sc=Greek}/u;
```

</eslint-code-block>

Binary properties and values of the `General_Category` property set to `"short"` and values of the `Script` property set to `"long"`:

<eslint-code-block fix>

```js
/* eslint regexp/unicode-property: ["error", { property: { binary: "short", generalCategory: "short", script: "long" } }] */

var re = /\p{Hex}/u;
var re = /\p{Hex_Digit}/u;
var re = /\p{L}/u;
var re = /\p{Letter}/u;
var re = /\p{sc=Grek}/u;
var re = /\p{sc=Greek}/u;
```

</eslint-code-block>

## :books: Further reading

- [MDN docs on Unicode property escapes](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape)

## :rocket: Version

:exclamation: <badge text="This rule has not been released yet." vertical="middle" type="error"> ***This rule has not been released yet.*** </badge>

## :mag: Implementation

- [Rule source](https://github.com/ota-meshi/eslint-plugin-regexp/blob/master/lib/rules/unicode-property.ts)
- [Test source](https://github.com/ota-meshi/eslint-plugin-regexp/blob/master/tests/lib/rules/unicode-property.ts)
2 changes: 2 additions & 0 deletions lib/all-rules.ts
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ import sortCharacterClassElements from "./rules/sort-character-class-elements"
import sortFlags from "./rules/sort-flags"
import strict from "./rules/strict"
import unicodeEscape from "./rules/unicode-escape"
import unicodeProperty from "./rules/unicode-property"
import useIgnoreCase from "./rules/use-ignore-case"
import type { RuleModule } from "./types"

Expand Down Expand Up @@ -162,5 +163,6 @@ export const rules: RuleModule[] = [
sortFlags,
strict,
unicodeEscape,
unicodeProperty,
useIgnoreCase,
]
Loading
Loading