Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tell SharpZipLib to use Unicode when opening zips #3345

Merged
merged 1 commit into from
Apr 12, 2021

Conversation

DasSkelett
Copy link
Member

@DasSkelett DasSkelett commented Apr 11, 2021

Problem

Follow-up to #3329 – we still don't know whether it fixed #3342, but it probably didn't.
SharpZipLib did indeed change stuff regarding zip files with Unicode chars in the path strings, however not the way we wanted (icsharpcode/SharpZipLib#591), instead they are now defaulting to not use Unicode. This makes some mod inflations fail:
image

A short zip + Unicode introduction according to how I understand it:

From zip(1):

Unicode.
Though the zip standard requires storing paths in an archive using a specific character set, in practice zips have stored paths in archives in whatever the local character set is. This creates problems when an archive is created or updated on a system using one character set and then extracted on another system using a different character set.
When compiled with Unicode support enabled on platforms that support wide characters, zip now stores, in addition to the standard local path for backward compatibility, the UTF-8 translation of the path. This provides a common universal character set for storing paths that allows these paths to be fully extracted on other systems that support Unicode and to match as close as possible on systems that don't.

If a file path string in a zip contains Unicode chars, some zip programs seem to save two versions of this path string: one "standard local path for backward compatibility", and the UTF-8 translation of it.
Now depending on the tool and configuration used to read these zips again, they either try to use the "standard" path or the UTF-8 path, where the standard path seems to be failing on some systems, probably depending on whether you have the matching locales/code pages installed or not. The Unicode path should always work fine.

Solution

Now we force SharpZipLib to use the Unicode path. This is currently only possible with a global static bool, but things are likely to change again soon (icsharpcode/SharpZipLib#592).
The bool is set to true in the NetFileCache constructor, which I think is the most straight-forward and reliable place for it, since it is guaranteed to be called whenever and before we use the cache, which is when we are about to try to read zips.

I've also managed to write a unit test for it, thanks to @techman83 providing a small test zip that reproduces the behaviour we see from all the other international zips.
The test isn't worth much at least on my system because the zips somehow are read successfully even without ZipStrings.UseUnicode = true;, but in a fresh mono container the test would fail without ZipStrings.UseUnicode = true;.

I've also marked the GH1866 tests with a new category Display to have an easy way to ignore them, since they require either a real or a mock up display (which I didn't want to bother with in the Docker container).

Hopefully fixes #3342
Likely fixes all the inflation errors with Name is invalid, at least inflating them in a mono container works with the fix (and fails without it)

@DasSkelett DasSkelett added Bug Something is not working as intended Core (ckan.dll) Issues affecting the core part of CKAN Pull request Tests Issues affecting the internal tests labels Apr 11, 2021
@DasSkelett DasSkelett requested a review from HebaruSan April 11, 2021 23:19
@HebaruSan

This comment has been minimized.

@DasSkelett
Copy link
Member Author

Oh, good idea! They are also only called once, which is not required but nice to have here.

Copy link
Member

@HebaruSan HebaruSan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it! 🛸

@techman83
Copy link
Member

We can probably use xvfb if we need to run headless tests that require a framebuffer.

~ Xvfb :99 &; export DISPLAY=:99; echo $DISPLAY
[1] 24107
:99

@DasSkelett
Copy link
Member Author

DasSkelett commented Apr 12, 2021

We can probably use xvfb if we need to run headless tests that require a framebuffer.

We already do so in CI:

run: xvfb-run ./build test+only --configuration=${{ matrix.configuration }} --where="Category!=FlakyNetwork"

But I don't want to install and set up xvfb a hundred times in a short-lived container for testing something completely unrelated.
We aren't stopping to run these tests in CI, no worries.
It's just a simple change for a personal benefit, which other people might find useful as well (especially when testing this PR).

@DasSkelett DasSkelett merged commit 961077c into KSP-CKAN:master Apr 12, 2021
@DasSkelett DasSkelett deleted the fix/unicode-zips branch April 12, 2021 00:54
@DasSkelett
Copy link
Member Author

It actually worked! 🤩
The inflation errors for the mods are disappearing, and the Tundras got successfully indexed!

@techman83
Copy link
Member

Heck yeah! Good work @DasSkelett

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is not working as intended Core (ckan.dll) Issues affecting the core part of CKAN Tests Issues affecting the internal tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Name is invalid error for file in TundraExploration, seemingly in CJK locales
3 participants