-
-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22979 Support inverse rule for [] span in RBNF #3326
Conversation
"20: twent[y->>|ieth];\n" + | ||
"30: thirt[y->>|ieth];\n" + | ||
"40: fort[y->>|ieth];\n" + | ||
"50: fift[y->>|ieth];\n" + | ||
"60: sixt[y->>|ieth];\n" + | ||
"70: sevent[y->>|ieth];\n" + | ||
"80: eight[y->>|ieth];\n" + | ||
"90: ninet[y->>|ieth];\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Languages that define a rule for 31 or private rules for 30 are candidates to use this rule syntax.
"100: <%cardinal< [$(cardinal,one{hundred}other{hundreds})$ >>|$(cardinal,one{hundredth}other{hundredths})$];\n" + | ||
"1000: <%cardinal< [$(cardinal,one{thousand}other{thousands})$ >>|$(cardinal,one{thousandth}other{thousandths})$];\n" + | ||
"1000000: <%cardinal< [$(cardinal,one{million}other{millions})$ >>|$(cardinal,one{millionth}other{millionths})$];\n" + | ||
"1000000000: <%cardinal< [$(cardinal,one{billion}other{billions})$ >>|$(cardinal,one{billionth}other{billionths})$];\n" + | ||
"1000000000000: <%cardinal< [$(cardinal,one{trillion}other{trillions})$ >>|$(cardinal,one{trillionth}other{trillionths})$];\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This style of large ordinals is currently impossible to support when you have more than 2 cardinal states to consider. It's excessively tedious to repeatedly split and copy the rules. It's worse for languages that are highly inflectional with many grammatical cases.
Languages that use private rules like English's %%tieth
and %%th
ordinals rules are candidates to use this syntax. Those private rules only define 0 and 1.
char ch; | ||
while (start < descriptionLength) { | ||
// seek to the first non-whitespace character... | ||
// Seek to the first non-whitespace character... | ||
// If the first non-whitespace character is semicolon, skip it and continue | ||
while (start < descriptionLength | ||
&& PatternProps.isWhiteSpace(description.charAt(start))) | ||
&& (PatternProps.isWhiteSpace(ch = description.charAt(start)) || ch == ';')) | ||
{ | ||
++start; | ||
} | ||
|
||
//if the first non-whitespace character is semicolon, skip it and continue | ||
if (start < descriptionLength && description.charAt(start) == ';') { | ||
start += 1; | ||
continue; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this logic in stripWhitespace was out of sync between C++ and Java. Both sides have been brought in sync in this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great. A couple things you included as comments on this PR probably ought to be comments in the actual tests or code, and I had a couple small suggestions on your documentation, but the code looks great.
"1000000: <%cardinal< [$(cardinal,one{million}other{millions})$ >>|$(cardinal,one{millionth}other{millionths})$];\n" + | ||
"1000000000: <%cardinal< [$(cardinal,one{billion}other{billions})$ >>|$(cardinal,one{billionth}other{billionths})$];\n" + | ||
"1000000000000: <%cardinal< [$(cardinal,one{trillion}other{trillions})$ >>|$(cardinal,one{trillionth}other{trillionths})$];\n" + | ||
"1000000000000000: =#,##0=$(ordinal,one{st}two{nd}few{rd}other{th})$;"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little unclear on the $
syntax. It looks like you're just using normal plural rules. Do the English plural rules define few
as just 3, or you doing that somewhere else in here just for the purposes of the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other languages need to use the cardinal form for large ordinals. English doesn’t define few. I just wanted to verify that the syntax is usable, and not add additional plural rule tests. I think that these new tests demonstrate that.
I could try to pick a difficult language to test, like Lithuanian. I didn’t have patience to write such rules, and I wanted something readable, reviewable, and quick to write.
Do you really want the masculine singular nominative Lithuanian ordinals written in a test? I was hoping to defer such work and use the Number Format Tester instead of hard coded in tests. As an alternative, I can give you just the thousands line for documentation purposes. That’s simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a small sample of Lithianian thousands of the positive degree nominative masculine non-pronomial ordinal would be the following:
1000: [tūkstantis >>|tūkstantas];
2000: <%spellout-cardinal-masculine< [$(cardinal,one{tūkstantis}few{tūkstančiai}other{tūkstančių})$ >>|$(cardinal,one{tūkstantas}few{tūkstanti}other{tūkstantų})$];
Of course, this sample is unvetted, but the structure is what is needed. I got the spellings from Wiktionary. See tūkstantis, tūkstantas, and pirmas for the inflection tables. This pull request is needed to support ordinals larger than 9,999 in Lithuanian. To fully support Lithuanian ordinals, I'd likely need to copy the structure of these 2 rules less than 162 times for only the ordinals. The current Lithianian ordinal rules are a little clunky.
The structure is pretty close to the English example in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you had is okay with me; I was just trying to understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a couple of sentences before these 2 tests to clarify to describe what's going on. Please check it out.
icu4j/main/common_tests/src/test/java/com/ibm/icu/dev/test/format/RbnfTest.java
Show resolved
Hide resolved
icu4j/main/core/src/main/java/com/ibm/icu/text/RuleBasedNumberFormat.java
Outdated
Show resolved
Hide resolved
icu4j/main/core/src/main/java/com/ibm/icu/text/RuleBasedNumberFormat.java
Show resolved
Hide resolved
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look. I like your updates to the documentation. No notes.
This feature enhancement only affects the documentation and RBNF syntax. Some changes were made to keep both the ICU4J and ICU4C implementations in sync. Some compiler warnings were also fixed.
I’d like to extend the RBNF syntax to support more complex grammar. I’d like to change the omission rule with square brackets. By default, everything between the square brackets are omitted when the remainder is 0. My proposal will not change this behavior by default, unless a “|” (pipe symbol) is present between the square brackets. You can think of it performing like an else statement. Everything between the beginning square bracket and the pipe acts as it currently does. Everything between the pipe symbol and the end square bracket will be used instead of omitting the text.
This behavior is important for supporting large ordinals in slavic languages. It’s convenient for other languages, like English.
The test case in the prototype and the ticket provides more examples of the change. Below is a simplified example of the new syntax. Right now, we have the following ordinals in English.
That could be simplified to the following rules instead.
The cardinal and ordinal rules will work on either side of the pipe symbol.
Checklist