Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-16836 kbd: add EBNF to spec for transforms #4261

Merged
merged 7 commits into from
Jan 9, 2025

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Jan 1, 2025

CLDR-16836

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@srl295 srl295 self-assigned this Jan 1, 2025
@srl295 srl295 force-pushed the srl295/kbd/cldr-18197/bnf branch from 121aab0 to afa4af5 Compare January 2, 2025 20:04
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • docs/ldml/tr35-keyboards.md is different
  • keyboards/abnf/transform-from-required.abnf is different
  • tools/scripts/keyboard-abnf-tests/check-keyboard-abnf.sh is different
  • tools/scripts/keyboard-abnf-tests/transform-from-required.d/from-match.fail.txt is now changed in the branch
  • tools/scripts/keyboard-abnf-tests/transform-from-required.d/from-match.pass.txt is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 changed the title CLDR-18197 kbd: update spec to mention abnf CLDR-18197 kbd: update spec with ABNF Jan 2, 2025
@srl295
Copy link
Member Author

srl295 commented Jan 2, 2025

fyi @stasm and @aphillips

@srl295
Copy link
Member Author

srl295 commented Jan 2, 2025

  • NMTOKEN is a little incomplete - had a tool problem with high values. (worked around it)
  • Need to do the to= side (done)

@srl295 srl295 force-pushed the srl295/kbd/cldr-18197/bnf branch from a461f03 to 308448c Compare January 3, 2025 18:14
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • docs/ldml/tr35-keyboards.md is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 changed the title CLDR-18197 kbd: update spec with ABNF CLDR-18197 kbd: add EBNF to spec for transforms Jan 3, 2025
- add keyboard abnf and sample files and automated tests

Temporarily skip non-BMP chars,
see hildjj/node-abnf#25 which is being fixed.
@srl295 srl295 force-pushed the srl295/kbd/cldr-18197/bnf branch from 308448c to d38f212 Compare January 3, 2025 21:31
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 changed the title CLDR-18197 kbd: add EBNF to spec for transforms CLDR-16836 kbd: add EBNF to spec for transforms Jan 3, 2025
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

  1. The EBNF links should point to https://unicode.org/reports/tr35/#ebnf. (Since the LDML EBNF is a superset of the vanilla EBNF that should work.
  2. The validity / well-formedness constraints (eg no more than 9 capture groups), should use the w3c syntax.

@srl295
Copy link
Member Author

srl295 commented Jan 3, 2025

Looks good!

Thanks!

  1. The EBNF links should point to https://unicode.org/reports/tr35/#ebnf. (Since the LDML EBNF is a superset of the vanilla EBNF that should work.

I will add this:

The following is the [LDML EBNF](./tr35.md#ebnf) format for the grammar:
  1. The validity / well-formedness constraints (eg no more than 9 capture groups), should use the w3c syntax.

The 9 capture groups is 9 inner capture groups. So for example, valid:

/(first)(second)(third)(fourth)(fifth)(sixth)(seventh)(eighth)(?:And possibly, (ninth))?/

but invalid:

/(first)(second)(third)(fourth)(fifth)(sixth)(seventh)(eighth)(?:And possibly, (ninth))?(?:But not, (tenth)!)?/

because it's a nested group it may be challenging to express in the grammar, it's more like a resource (slot) limit.

@srl295 srl295 force-pushed the srl295/kbd/cldr-18197/bnf branch from 199f93e to c0e32b4 Compare January 3, 2025 21:51
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 requested a review from macchiati January 3, 2025 21:53
@srl295 srl295 marked this pull request as ready for review January 3, 2025 21:54
@srl295
Copy link
Member Author

srl295 commented Jan 3, 2025

I'd like to get some Java based tooling, but that will be in a separate PR.
Edit: ran into some toolchain snags, so java tooling not ready yet. I'd like to land this current PR first anyways.

docs/ldml/tr35-keyboards.md Outdated Show resolved Hide resolved
docs/ldml/tr35-keyboards.md Outdated Show resolved Hide resolved
| DIGIT
| '_'
ASCII-CTRLS
::= [#x1-#x8#xB-#xC#xE-#x1F]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not allowed in XML 1.0

docs/ldml/tr35-keyboards.md Outdated Show resolved Hide resolved
@srl295
Copy link
Member Author

srl295 commented Jan 4, 2025 via email

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I requested changes for the constraints.

@srl295 srl295 requested review from macchiati and miloush January 8, 2025 22:46
```ebnf
[ wfc: No more than 9 capture groups may be present. ]
[ vc: all variables referenced must be defined in the <variables> element ]
[ vc: The CLDR repository may define additional constraints on the repertoire, such as requiring all characters to be in a published Unicode version and disallowing private-use characters. ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ vc: The CLDR repository may define additional constraints on the repertoire, such as requiring all characters to be in a published Unicode version and disallowing private-use characters. ]
[ vc: If a keyboard definitions is submitted to the the CLDR repository, it must satisfy additional constraints on the character repertoire. For more information, see [CLDR keyboard repertoire constraints](#repertoire-constraints). ]

We have to point to where those constraints are documented. So add a little section header for that and point to it. Also make a change to a similar line below.

I suggest that the contents of that section be:

  • No characters can have any of the following General_Category values in the latest version of the Unicode Standard:
    • Private Use (Co)
    • Surrogate (Cs)
    • Unassigned (Cn)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we can link from ebnf to spec.

Also the repository requirements are a separate activity - I don't see a reason to specify them here in detail (though I'll take the list back to kbd-wg). Perhaps it's better to just say- see a future version of the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this

[ vc: If a keyboard definitions is submitted to the the CLDR repository, it must satisfy additional constraints on the character repertoire. ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is there is no way for the reader to find out what those requirements are — we can't have undefined vc requirements.

Copy link
Member Author

@srl295 srl295 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'd propose to drop it from vc and just note. Or not even note. We don't have the requirements yet.

Yes there's no way to find out the requirements because they are future.

docs/ldml/tr35-keyboards.md Outdated Show resolved Hide resolved
docs/ldml/tr35-keyboards.md Outdated Show resolved Hide resolved
@srl295
Copy link
Member Author

srl295 commented Jan 9, 2025

@miloush how does it look?

@srl295 srl295 merged commit 2b78422 into unicode-org:main Jan 9, 2025
10 checks passed
@srl295 srl295 deleted the srl295/kbd/cldr-18197/bnf branch January 9, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants