-
Notifications
You must be signed in to change notification settings - Fork 159
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
make_unicode_tables.awk is now UnicodeTablesGenerator
UnicodeTablesGenerator uses Unicode data from ICU4J to generate Unicode tables for consumption by RE2/J. Output is google-java-formatted before it is written. No new runtime dependencies are added to RE2/J. The generator uses ICU4J 4.8.2 which bundles Unicode 6.0.0. This keeps it compatible with Java 8, which RE2/J targets. Consideration should be given for how we might upgrade to later Unicode versions without introducing inconsistencies (e.g. RE2/J matches something that shouldn't match according to java.lang.Character data). There are some differences in the generated tables: * the new tables do not contain binary property character ranges (e.g. ASCII_Hex_digit), as those tables are currently unused in RE2/J. * Cc (control) char class now contains NUL (u+0000), this is correct and was also the subject of #26. See https://github.com/google/re2j/files/4725343/diff.txt for a full list of differences between the old tables and the new.
- Loading branch information
Showing
7 changed files
with
476 additions
and
202 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
rootProject.name = 're2j' | ||
|
||
include ':benchmarks' | ||
include ':unicode' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Utilities for emitting Unicode tables used by RE2J. | ||
|
||
To rebuild the Unicode tables, run: | ||
|
||
``` | ||
./gradlew :unicode:run -q > java/com/google/re2j/UnicodeTables.java | ||
``` | ||
|
||
from the project root directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
plugins { | ||
id 'java' | ||
id 'application' | ||
} | ||
|
||
mainClassName = 'com.google.re2j.UnicodeTablesGenerator' | ||
|
||
repositories { | ||
mavenCentral() | ||
} | ||
|
||
dependencies { | ||
compile 'com.google.googlejavaformat:google-java-format:1.0' | ||
compile 'com.squareup:javapoet:1.12.1' | ||
compile 'com.ibm.icu:icu4j:4.8.2' | ||
compile 'com.google.guava:guava:29.0-jre' | ||
} |
Oops, something went wrong.