Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale-specific string functions should not be used to build classnames #93

Closed
3 tasks done
asmecher opened this issue Sep 24, 2020 · 16 comments
Closed
3 tasks done
Labels

Comments

@asmecher
Copy link
Contributor

Please follow the general troubleshooting steps first:

  • I read the README and followed the instructions.
  • I am sure that the used CSL metadata follows the CSL schema.
  • I use a valid CSL stylesheet

Bug reports:

Factory.inc.php uses ucfirst to assemble a constraint class name for inclusion. However, it uses the ucfirst function to do so, and upper/lower case behaviour in the current locale may not match the code's expectations.

For example, when using the tr_TR locale, ucfirst('i') yields "i", leading to:

PHP message: PHP Fatal error:  Uncaught Seboettg\CiteProc\Exception\ClassNotFoundException: Class "Seboettg\CiteProc\Constraint\isNumeric" could not be found. in .../citeproc-php/src/Constraint/Factory.php:41
Stack trace:
#0 .../citeproc-php/src/Rendering/Choose/ChooseIf.php(67): Seboettg\\CiteProc\\Constraint\\Factory::createConstraint()
#1 .../citeproc-php/src/Rendering/Choose/Choose.php(53): Seboettg\\CiteProc\\Rendering\\Choose\\ChooseIf->__construct()
#2 .../citeproc-php/src/Util/Factory.php(64): Seboettg\\CiteProc\\Rendering\\Choose\\Choose->__construct()
...

(This is a common Turkish locale gotcha; see e.g. smarty-php/smarty#155 and the proposed fix at smarty-php/smarty#586. I have not checked elsewhere to see if other locale-sensitive functions are used in this way in this library.)

Originally reported by an OJS end user: https://forum.pkp.sfu.ca/t/articles-in-the-second-locale-language-do-not-show/62967

Used CSL stylesheet:

(Any)

Used CSL metadata

(Any)

seboettg added a commit that referenced this issue Sep 25, 2020
@seboettg
Copy link
Owner

Hi @asmecher!
Thank you for that bug report. It's really strange which strange issues can occur. Thanks also for your code snipped from smarty-php, that helped me a lot.

Unfortunately is very hard to test this issue, since I cannot just change my system locale...
In order to test if my fix really works, please checkout the branch issue-93 and run the unit tests on the system with Turkish locales.

@seboettg seboettg added the bug label Sep 25, 2020
@asmecher
Copy link
Contributor Author

Thanks, @seboettg! Yes, this is a nasty one and pops up all over the place (see also https://bugs.php.net/bug.php?id=18556). I feel sorry for Turkish developers who run into this stuff while trying to learn!

I tried to run the tests, but I'm getting a lot of test failures (on the master branch also) of the form:

TypeError: Argument 1 passed to Seboettg\Collection\ArrayList::__construct() must be of the type array, string given, called in /home/asmecher/git/citeproc-php/src/Rendering/Name/Names.php on line 148
TypeError: Argument 1 passed to Seboettg\Collection\ArrayList::__construct() must be of the type array, object given, called in /home/asmecher/git/citeproc-php/src/CiteProc.php on line 173

Maybe this is fixed elsewhere? (PHPUnit 8.5.2, PHP 7.4.3.)

I suspect you can get the testing to run under the Turkish locale by following these steps (Linux):

  1. Enable and compile the Turkish locale. Edit /etc/locale.gen and add/uncomment the line: tr_TR.UTF-8 UTF-8
  2. Re-generate your locales: sudo locale-gen
  3. Verify that ucfirst behaves differently under Turkish: php -r "setlocale(LC_ALL, 'tr_TR.UTF-8'); echo ucfirst('i');" (This should yield a lower case i instead of a capital I like you expect)
  4. Force tests to run under the Turkish locale. Edit tests/bootstrap.php and add: setlocale(LC_ALL, 'tr_TR.UTF-8');
  5. Run the tests!

I suspect you could add a new test that incorporates the setlocale call, but checks the system ucfirst under tr_TR.UTF-8 to see whether the weird behaviour exists. If it doesn't, then probably the tr_TR locale isn't enabled in the OS (steps 1 and 2) and the test should be skipped.

@seboettg
Copy link
Owner

Thank you @asmecher
I use MacOS and unfortunately I have no other computer, except a raspberry pi with a minimal setup and without PHP.
I tried to change the locale on my system, but it doesn't work:

$ php -r "setlocale(LC_ALL, 'tr_TR.UTF-8'); echo ucfirst('i');"
I%                                  

But I think the unit tests might work when you run composer update before. After that you can just run composer test.

...
Time: 22.03 seconds, Memory: 28.00 MB
OK (199 tests, 429 assertions)

@asmecher
Copy link
Contributor Author

Thanks, that got it. The master branch tests fine, but adding the setlocale to bootstrap.php breaks it as expected. Running the tests on issue-93 still shows some errors, however:

PHPUnit test output from issue-93 with setlocale to tr_TR.UTF-8

@seboettg
Copy link
Owner

Ah, ok! You must run composer install or composer dump-autoload after switching the branch to rebuild the autoload files.

@asmecher
Copy link
Contributor Author

Gotcha, but the log file linked above still shows some locale-specific assumptions that Turkish breaks. See e.g. src/Util/NumberHelper.php, where strtoupper is used on content containing roman numerals, and thus i supplied as input won't get converted to I:

1) Seboettg\CiteProc\Rendering\NumberTest::testRomanInputSingleNumber
A non-numeric value encountered

/home/asmecher/git/citeproc-php/src/Rendering/Number.php:146
/home/asmecher/git/citeproc-php/src/Rendering/Number.php:95
/home/asmecher/git/citeproc-php/tests/src/Rendering/NumberTest.php:130

2) Seboettg\CiteProc\Rendering\NumberTest::testRomanInputRangeNumber
A non-numeric value encountered

/home/asmecher/git/citeproc-php/src/Rendering/Number.php:146
/home/asmecher/git/citeproc-php/src/Rendering/Number.php:95
/home/asmecher/git/citeproc-php/tests/src/Rendering/NumberTest.php:137

3) Seboettg\CiteProc\Util\NumberHelperTest::testRoman2Dec
Undefined index: i

/home/asmecher/git/citeproc-php/src/Util/NumberHelper.php:101
/home/asmecher/git/citeproc-php/tests/src/Util/NumberHelperTest.php:35

seboettg added a commit that referenced this issue Sep 25, 2020
@seboettg
Copy link
Owner

In NumberHelper strlen and strtoupper was used. I replaced these functions with mb_strlen and mb_strtoupper. I hope this fix will work.

@asmecher
Copy link
Contributor Author

Very close!

There were 2 errors:

1) Seboettg\CiteProc\Rendering\NumberTest::testRomanInputSingleNumber
A non-numeric value encountered

/home/asmecher/git/citeproc-php/src/Rendering/Number.php:146
/home/asmecher/git/citeproc-php/src/Rendering/Number.php:95
/home/asmecher/git/citeproc-php/tests/src/Rendering/NumberTest.php:130

2) Seboettg\CiteProc\Rendering\NumberTest::testRomanInputRangeNumber
A non-numeric value encountered

/home/asmecher/git/citeproc-php/src/Rendering/Number.php:146
/home/asmecher/git/citeproc-php/src/Rendering/Number.php:95
/home/asmecher/git/citeproc-php/tests/src/Rendering/NumberTest.php:137

--

There was 1 failure:

1) Seboettg\CiteProc\Util\NumberHelperTest::testIsRomanNumber
Failed asserting that false is true.

/home/asmecher/git/citeproc-php/tests/src/Util/NumberHelperTest.php:41

seboettg added a commit that referenced this issue Sep 25, 2020
@seboettg
Copy link
Owner

Okay... once again, please! 👍

@asmecher
Copy link
Contributor Author

Down to 1!

There was 1 error:

1) Seboettg\CiteProc\Rendering\NumberTest::testRomanInputRangeNumber
A non-numeric value encountered

/home/asmecher/git/citeproc-php/src/Rendering/Number.php:146
/home/asmecher/git/citeproc-php/src/Rendering/Number.php:95
/home/asmecher/git/citeproc-php/tests/src/Rendering/NumberTest.php:137

@seboettg
Copy link
Owner

Oh no! I'm so sorry to have to bother you with these tests.

seboettg added a commit that referenced this issue Sep 26, 2020
seboettg added a commit that referenced this issue Sep 26, 2020
@seboettg
Copy link
Owner

Ok, I have now set up my raspi so that I can do these tests by myself.

I figured out, that there seems to be a problem with case insensitive regular expressions...

/^([ivxlcdm]+\.*)\s*([*\–\-&+,;])\s*([ivxlcdm]+\.?)$/i

For example this pattern does not work with Turkish locale.

seboettg added a commit that referenced this issue Sep 26, 2020
…on with systems that use tr_TR locale / cleanup
@seboettg
Copy link
Owner

I guess all issues are solved now. @asmecher can you confirm that?

@asmecher
Copy link
Contributor Author

Tests pass!

OK (199 tests, 429 assertions)

@seboettg
Copy link
Owner

Ok, version 2.2.2 is released and contains these fixes. I will close this ticket. If you think additional doings are necessary, feel free to reopen this bug report.

@asmecher
Copy link
Contributor Author

Thanks for your excellent work as always, @seboettg!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants