Better accommodations for reading non-UTF-8 PO files #538

Fat-Zer · 2024-10-29T00:22:38Z

Instead of reading the whole file and only then checking the charset, read just up to the first msgid/msgstr, check if it specifies a charset right away. If it does, imbue the file with the correct charset and read the rest.

In particular this:

Avoid unnecessary warnings
Avoid reading the file repeatedly

The idea was previously mentioned here

Fat-Zer · 2024-10-29T01:17:58Z

I'm not sure why CI fails. Is it possible to somehow get it more verbose? e.g. to yank the po4a output from it?
PS: locally tests run fine.

mquinson · 2024-11-04T16:40:48Z

The logs read:

Malformed test charset/po-iso8859 (PO file encoding: iso8859-1): no expected output. Please touch charset/po-iso8859/_output
# Looks like your test exited with 2 just after 17.
t/charset.t ........... 
# Subtest: master encoding: ascii (dstdir)
[...]
Dubious, test returned 2 (wstat 512, 0x200)
All 17 subtests passed

Could you please add this file to your PR?

Thanks,
Mt

Instead of reading the whole file and only then checking the charset, read just up to the first msgid/msgstr, check if it specifies a charset right away. If it does, imbue the file with the correct charset and read the rest. In particular this: - Avoid unnecessary warnings - Avoid reading the file repeatedly Signed-off-by: Alexander Golubev <fatzer2@gmail.com>

Fat-Zer · 2024-11-05T05:23:13Z

oh... that was a bit embracing >_<... I have completely overlooked the missing file and were too quickly jumping to confusion that it was some weird perl-encoding-handling issue.

fixed.

mquinson · 2024-11-05T07:21:05Z

Don't be embarrassed. This test module isn't specially user-friendly, it's just that I'm used to it.

Beyond, your code is much better than my previous attempt at reading non-UTF-8 PO files. Many thanks for that.

Fat-Zer force-pushed the non-utf-po branch 3 times, most recently from 180aa20 to 361ffc7 Compare October 29, 2024 01:12

Fat-Zer force-pushed the non-utf-po branch from 361ffc7 to 22cfa9d Compare November 5, 2024 05:16

mquinson merged commit ecb2c08 into mquinson:master Nov 5, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better accommodations for reading non-UTF-8 PO files #538

Better accommodations for reading non-UTF-8 PO files #538

Fat-Zer commented Oct 29, 2024

Fat-Zer commented Oct 29, 2024

mquinson commented Nov 4, 2024

Fat-Zer commented Nov 5, 2024 •

edited

Loading

mquinson commented Nov 5, 2024

Better accommodations for reading non-UTF-8 PO files #538

Better accommodations for reading non-UTF-8 PO files #538

Conversation

Fat-Zer commented Oct 29, 2024

Fat-Zer commented Oct 29, 2024

mquinson commented Nov 4, 2024

Fat-Zer commented Nov 5, 2024 • edited Loading

mquinson commented Nov 5, 2024

Fat-Zer commented Nov 5, 2024 •

edited

Loading