Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better accommodations for reading non-UTF-8 PO files #538

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

Fat-Zer
Copy link
Contributor

@Fat-Zer Fat-Zer commented Oct 29, 2024

Instead of reading the whole file and only then checking the charset, read just up to the first msgid/msgstr, check if it specifies a charset right away. If it does, imbue the file with the correct charset and read the rest.

In particular this:

  • Avoid unnecessary warnings
  • Avoid reading the file repeatedly

The idea was previously mentioned here

@Fat-Zer Fat-Zer force-pushed the non-utf-po branch 3 times, most recently from 180aa20 to 361ffc7 Compare October 29, 2024 01:12
@Fat-Zer
Copy link
Contributor Author

Fat-Zer commented Oct 29, 2024

I'm not sure why CI fails. Is it possible to somehow get it more verbose? e.g. to yank the po4a output from it?
PS: locally tests run fine.

@mquinson
Copy link
Owner

mquinson commented Nov 4, 2024

The logs read:

Malformed test charset/po-iso8859 (PO file encoding: iso8859-1): no expected output. Please touch charset/po-iso8859/_output
# Looks like your test exited with 2 just after 17.
t/charset.t ........... 
# Subtest: master encoding: ascii (dstdir)
[...]
Dubious, test returned 2 (wstat 512, 0x200)
All 17 subtests passed 

Could you please add this file to your PR?

Thanks,
Mt

Instead of reading the whole file and only then checking the charset,
read just up to the first msgid/msgstr, check if it specifies a
charset right away. If it does, imbue the file with the correct charset
and read the rest.

In particular this:
- Avoid unnecessary warnings
- Avoid reading the file repeatedly

Signed-off-by: Alexander Golubev <fatzer2@gmail.com>
@Fat-Zer
Copy link
Contributor Author

Fat-Zer commented Nov 5, 2024

oh... that was a bit embracing >_<... I have completely overlooked the missing file and were too quickly jumping to confusion that it was some weird perl-encoding-handling issue.

fixed.

@mquinson
Copy link
Owner

mquinson commented Nov 5, 2024

Don't be embarrassed. This test module isn't specially user-friendly, it's just that I'm used to it.

Beyond, your code is much better than my previous attempt at reading non-UTF-8 PO files. Many thanks for that.

@mquinson mquinson merged commit ecb2c08 into mquinson:master Nov 5, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants