Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Unicode to version 16.0.0, bump to 0.1.24 #103

Merged
merged 5 commits into from
Sep 17, 2024

Conversation

Marcondiro
Copy link
Contributor

Yet another tables update to Unicode 16.0.0
Thanks!

@Marcondiro

This comment was marked as outdated.

@Marcondiro Marcondiro force-pushed the master branch 5 times, most recently from ea010a0 to 4d86994 Compare September 13, 2024 15:25
@Marcondiro Marcondiro marked this pull request as draft September 13, 2024 15:39
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible there are algorithm changes as well in 16.0

Cargo.toml Outdated Show resolved Hide resolved
@Marcondiro
Copy link
Contributor Author

There is some issue in the nfc computation for two new characters: U+105C9 and U+105E4

@Marcondiro
Copy link
Contributor Author

Marcondiro commented Sep 16, 2024

Likely a bug here

def gen_composition_table(canon_comp, out):
table = {}
for (c1, c2), c3 in canon_comp.items():
if c1 < 0x10000 and c2 < 0x10000:
table[(c1 << 16) | c2] = c3
(salt, keys) = minimal_perfect_hash(table)
gen_mph_data('COMPOSITION_TABLE', table, '(u32, char)',
lambda k: f"(0x{k:08X}, '\\u{{{table[k]:06X}}}')", 1)
out.write("pub(crate) fn composition_table_astral(c1: char, c2: char) -> Option<char> {\n")
out.write(" match (c1, c2) {\n")
for (c1, c2), c3 in sorted(canon_comp.items()):
if c1 >= 0x10000 and c2 >= 0x10000:
out.write(" ('\\u{%s}', '\\u{%s}') => Some('\\u{%s}'),\n" % (hexify(c1), hexify(c2), hexify(c3)))
out.write(" _ => None,\n")
out.write(" }\n")
out.write("}\n")

My hypotheses is that if (c1 < 0x10000 && c2 >= 0x10000) || (c1 > 0x10000 && c2 <= 0x10000) the entry is ignored

@Marcondiro Marcondiro marked this pull request as ready for review September 17, 2024 12:09
@Manishearth Manishearth merged commit c992130 into unicode-rs:master Sep 17, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants