[hl] Unicode char swallows next character #8245

Aurel300 · 2019-05-02T16:29:23Z

class Main {
  public static function main():Void {
    var str = "\u{10000}a";
    for (codepoint in haxe.iterators.StringIteratorUnicode.unicodeIterator(str)) trace(codepoint);
  }
}

Expected output:

$ haxe --main Main --interp
Main.hx:4: 65536
Main.hx:4: 97

On HL:

$ haxe --main Main --hl out.hl
$ hl out.hl
Main.hx:4: 65536

One character (byte?) is always missed after U+10000. It leads to more problems with subsequent multi-byte codepoints, but I'm sure the first one is the actual cause. Does not happen with U+FFFF or U+10001. Seems like an off-by-one error, since U+10000 is the first 4-byte codepoint in UTF-8.

Just trace on the string seems to produce the correct output (F0 90 80 80):

$ hl out.hl | xxd
00000000: 4d61 696e 2e68 783a 343a 20f0 9080 8061  Main.hx:4: ....a
...

The text was updated successfully, but these errors were encountered:

Aurel300 · 2019-05-02T17:28:04Z

(Closed in 09a0d1e)

Aurel300 added platform-hl Everything related to HashLink unicode labels May 2, 2019

Aurel300 self-assigned this May 2, 2019

Aurel300 mentioned this issue May 2, 2019

Unicode sys tests and fixes #8135

Merged

73 tasks

Aurel300 closed this as completed May 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hl] Unicode char swallows next character #8245

[hl] Unicode char swallows next character #8245

Aurel300 commented May 2, 2019 •

edited

Loading

Aurel300 commented May 2, 2019

[hl] Unicode char swallows next character #8245

[hl] Unicode char swallows next character #8245

Comments

Aurel300 commented May 2, 2019 • edited Loading

Aurel300 commented May 2, 2019

Aurel300 commented May 2, 2019 •

edited

Loading