Provides a fast and space efficient ESM based Unicode grapheme parser including an iterable parser.
There are two resources available that work well in the browser via the fflate
compression library:
-
A modern fork of unicode-trie; build optimized binary trie data structure for quick Unicode lookup.
-
A modern fork of graphemesplit supports UAX#29
The main use case presently supported is parsing strings for Unicode grapheme clusters.
The following functions are exported from @typhonjs-utils/unicode
:
graphemeSplit(string): string[]
graphemeIterator(string): IterableIterator<string>
For instance, you can use graphemeIterator
as a tokenizer for @typhonjs-svelte/trie-search allowing the trie to
be made up of Unicode graphemes. There is more work to be done on this package especially for making a complete
implementation of graphemeIterator
. Right now there is a trivial / eager implementation that uses graphemeSplit
, so
the goal is to move toward creating a graphemeIterator
implementation w/ full Unicode support, but more importantly
the most compact browser capable implementation possible.
When you bundle this package for the browser presumably w/ Rollup or another bundler do remember to configure your
bundle for browser support. For instance when using Rollup and @rollup/plugin-node-resolve
pass { browser: true }
to the Node resolve plugin.
- Complete a non-eager implementation of
graphemeIterator
.