-
-
Notifications
You must be signed in to change notification settings - Fork 15.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetchzip: force UTF-8 compatible locale to unpack non-ASCII symbols #176253
Conversation
# Otherwise unzip unpacks escaped file names as if '-U' options was in effect. | ||
# | ||
# Pick en_US.UTF-8 as most possible to be present on glibc, musl and darwin. | ||
LANG=en_US.UTF-8 unzip -qq "$curSrc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C.UTF-8 maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU C.UTF-8
is not supported on macos. musl
would probably work (not sure either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we only need to find the ~10 packages which disabled tests because of that
34452fc
to
a8ce7b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code wise this should be fine but I am not deep enough into fetchzip and reproducibility.
Before merging it Sandro also suggested finding up existing problematic tarballs. We can recalculate their hashes as part of this PR. I'll try to extract at least some of problematic ones. Very crude way to find
List of suspects that started failing:
List of already failing builds (fetch fails upstream URL):
List of already failing builds (hash mismatch, changed tarball?):
|
@ofborg build coq_8_13.src coq_8_14.src coq_8_15.src eduli.src |
2a68f57
to
8da2af5
Compare
@ofborg build coq_8_13.src coq_8_14.src coq_8_15.src coq_8_16.src eduli.src |
8da2af5
to
7836e18
Compare
|
glibcLocalesUtf8 is a trimmed down version of glibcLocales that provides UTF-8 locale. It is useful to use in derivations that need some UTF-8 lcoale, like unzip in NixOS#176253.
musl and darwin support UTF-8 locales without any extras. As a result unzip can unpack UTF-8 filenames there as is. But on glibc without locale archive presence files get mangled as: deps/αβ -> deps/#U03b1#U03b2 This makes `fetchzip` fixed-output derivations unstable. Tested this change to fail in `coq.src` which was generated in system that mangles UTF-8 symbols: $ nix build -f. coq.src --rebuild -L source> trying https://github.com/coq/coq/archive/V8.15.2.zip source> % Total % Received % Xferd Average Speed Time Time Time Current source> Dload Upload Total Spent Left Speed source> 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 source> 100 8945k 100 8945k 0 0 1513k 0 0:00:05 0:00:05 --:--:-- 1989k source> unpacking source archive /build/V8.15.2.zip error: hash mismatch in fixed-output derivation '/nix/store/hrnyykm7wgw8vxisgq7hc2bg5gr0y6s8-source.drv': specified: sha256-h81nFqkuvZkMR7YLHy7laTq5yOhjMW+w6rYzncxvyD4= got: sha256-DTspmwyD3Evl1CUmvUy2MonbLGUezvsHN3prmP9eK2I= Note: it means that some of existing caches for fixed output derivations become incorrect. It should not break already cached tarballs on cache.nixos.org thus the impact should not be widespread.
fetchzip changed unpacking of UTF-8 files on glibc systems: NixOS#176253 As a result unpacked contents changed it's filenames.
7836e18
to
9499614
Compare
Good point. Added shorter form of + glibcLocalesUtf8 =
+ if stdenv.hostPlatform.isLinux
+ then callPackage ../development/libraries/glibc/locales.nix { allLocales = false; }
+ else null; and updated |
We should reverse that default IMO. Most users are fine with C.UTF-8 plus their set locale and maybe en_US.UTF-8. |
Well, that's like a discussion for another topic. For users it's mostly the NixOS EDIT: I think some people have argued for basics (perhaps |
…chzip update fetchzip changed unpacking of UTF-8 files on glibc systems: NixOS#176253 As a result unpacked contents changed it's filenames. Closes: NixOS#176225
9499614
to
739ab38
Compare
Doing a PR right now to fix that. We can discuss this further there. |
according to bisect this(ffb456a) broke the eval of
found with
|
Good catch! I always wondered why we defined it non-null on musl (where it does not break, but builds unused glibc and locales). Proposed the fix as: #181384 |
musl and darwin support UTF-8 locales without any extras. As a result
unzip can unpack UTF-8 filenames there as is. But on glibc without
locale archive presence files get mangled as:
This makes
fetchzip
fixed-output derivations unstable.Tested this change to fail in
coq.src
which was generated in systemthat mangles UTF-8 symbols:
Note: it means that some of existing caches for fixed output
derivations become incorrect. It should not break already cached
tarballs on cache.nixos.org thus the impact should not be widespread.
Description of changes
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)nixos/doc/manual/md-to-db.sh
to update generated release notes