Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set forgiving encoding fallback when parsing Po file #515

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion lib/Locale/Po4a/Po.pm
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,9 @@ sub read {
my $filename = shift
or croak wrap_mod( "po4a::po", dgettext( "po4a", "Please provide a non-null filename" ) );

my $charset = shift // 'UTF-8';
my $charset = shift;
my $did_request_charset = defined($charset);
$charset //= 'UTF-8';
$charset = 'UTF-8' if $charset eq "CHARSET";
warn "Read $filename with encoding: $charset" if $debug{'encoding'};

Expand All @@ -340,6 +342,12 @@ sub read {
unless ( $? == 0 );
}

# Allow reading with the wrong encoding.
# We will read the expected encoding from the file itself later.
use Encode qw(:fallback_all);
use PerlIO::encoding;
$PerlIO::encoding::fallback = $did_request_charset ? WARN_ON_ERR : FB_DEFAULT;

my $fh;
if ( $filename eq '-' ) {
$fh = *STDIN;
Expand All @@ -366,6 +374,10 @@ sub read {
$self->read( $filename, $detected_charset, $checkvalidity );
return;
}
} elsif ( !$did_request_charset ) {
# Read again, this time requesting explicitely UTF-8 encoding to printing warnings if the charset does not match
$self->read( $filename, 'UTF-8', $checkvalidity );
return;
}

if ( $pofile =~ m/^\N{BOM}/ ) { # UTF-8 BOM detected
Expand Down
7 changes: 7 additions & 0 deletions t/charset.t
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ push @tests,
'format' => 'yaml',
'options' => "-M UTF-8",
'input' => "charset/yaml/utf8.yaml",
},
{
'doc' => 'implicit encoding: iso8859',
'po4a.conf' => 'charset/implicit-iso8859/po4a.conf',
'closed_path' => 'charset/*/',
'options' => '--keep 0',
'expected_files' => 'iso8859.pot iso8859.en.po iso8859.up.pod ',
};

run_all_tests(@tests);
Expand Down
18 changes: 18 additions & 0 deletions t/charset/implicit-iso8859/_iso8859.en.pod
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@

*****************************************************
* GENERATED FILE, DO NOT EDIT *
* THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
*****************************************************

This file was generated by po4a(7). Do not store it (in VCS, for example),
but store the PO file used as source file by po4a-translate.

In fact, consider this as a binary, and the PO file as a regular .c file:
If the PO get lost, keeping this translation up-to-date will be harder.

=encoding UTF-8

=head1 Test title

blebleble lalala

18 changes: 18 additions & 0 deletions t/charset/implicit-iso8859/_iso8859.up.pod
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@

*****************************************************
* GENERATED FILE, DO NOT EDIT *
* THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
*****************************************************

This file was generated by po4a(7). Do not store it (in VCS, for example),
but store the PO file used as source file by po4a-translate.

In fact, consider this as a binary, and the PO file as a regular .c file:
If the PO get lost, keeping this translation up-to-date will be harder.

=encoding iso-8859-1

=head1 S<Tést title>

S<blèbleble làlala>

5 changes: 5 additions & 0 deletions t/charset/implicit-iso8859/_output
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Updating iso8859.pot: (2 entries)
Updating iso8859.en.po: 2 translated messages.
Updating iso8859.nb.po: 2 translated messages.
iso8859.en.pod is 100% translated (2 strings).
iso8859.nb.pod is 100% translated (2 strings).
27 changes: 27 additions & 0 deletions t/charset/implicit-iso8859/iso8859.en.po
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# English translations for implicit-iso package
# Copyright (C) 2024 Free Software Foundation, Inc.
# This file is distributed under the same license as the implicit-iso package.
# Automatically generated, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: implicit-iso 8859\n"
"POT-Creation-Date: 2024-09-03 11:19+0200\n"
"PO-Revision-Date: 2024-09-03 11:19+0200\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: en\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

#. type: =head1
#: iso8859.pod:1
msgid "Test title"
msgstr "Test title"

#. type: textblock
#: iso8859.pod:3
msgid "blebleble lalala"
msgstr "blebleble lalala"
4 changes: 4 additions & 0 deletions t/charset/implicit-iso8859/iso8859.pod
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
=head1 Test title

blebleble
lalala
27 changes: 27 additions & 0 deletions t/charset/implicit-iso8859/iso8859.pot
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# SOME DESCRIPTIVE TITLE
# Copyright (C) YEAR Free Software Foundation, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2024-09-25 09:50+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#. type: =head1
#: iso8859.pod:1
msgid "Test title"
msgstr ""

#. type: textblock
#: iso8859.pod:3
msgid "blebleble lalala"
msgstr ""
27 changes: 27 additions & 0 deletions t/charset/implicit-iso8859/iso8859.up.po
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Norwegian Bokmal translations for implicit-iso package
# Copyright (C) 2024 Free Software Foundation, Inc.
# This file is distributed under the same license as the implicit-iso package.
# Automatically generated, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: implicit-iso 8859\n"
"POT-Creation-Date: 2024-09-03 10:10+0200\n"
"PO-Revision-Date: 2024-09-03 10:10+0200\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: nb\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=iso-8859-1\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

#. type: =head1
#: iso8859.pod:1
msgid "Test title"
msgstr "T�st�title"

#. type: textblock
#: iso8859.pod:3
msgid "blebleble lalala"
msgstr "bl�bleble�l�lala"
5 changes: 5 additions & 0 deletions t/charset/implicit-iso8859/po4a.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[po4a_paths] iso8859.pot en:iso8859.en.po up:iso8859.up.po

[options] opt:"--msgmerge-opt --silent"

[type:pod] iso8859.pod en:iso8859.en.pod up:iso8859.up.pod