-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of support for non-ASCII characters in cabal sdist #3758
Comments
To be clear, this only applies to See upstream haskell/tar#6 |
Thank you for the comment, you just unbroke my brain. I couldn't reconcile what I saw in the Cabal source code with the behavior I was experiencing. |
It certainly puzzled me a while too. |
So this is basically a dupe of haskell/tar#6? |
Yes, but it can be worked around inside cabal-install the same way I proposed doing it in Stack, see https://github.com/commercialhaskell/stack/pull/2557/files#diff-7f7c1fa6ec0ee064e1a11c95f5b1b025L110. |
I was about to try my hand at a patch for this, but I'm not sure which library to use for the UTF8 encoding. Neither text nor utf8-string appear to be dependencies of cabal-install right now, and I doubt they are desired additions. I suppose an alternative is simply to check the paths to ensure they are ASCII and error out if not. |
@snoyberg |
Something like this workaround would be acceptable for now, but in the long run we should fix it at the source. |
I was just looking into "fixing it at the source" as well. It seems unclear how to do it. One idea I had is:
I can float this idea over there too. But I'm not too excited about it, since it's just moving the exact same approach I'm recommending here and in Stack to the tar package, and there doesn't seem to be anything better without breaking backwards compatibility. |
Actually, this is harder to resolve in cabal-install than it was in Stack, due to usage of the |
@snoyberg Since our aim is to make tarballs with unicode fail better, I was thinking about making the |
IMO handling that case with UTF8 encoding makes more sense, but best behavior is certainly ambiguous. |
So what do people recommend here? Should we generally switch cabal and hackage over to allowing UTF8 encoded file names? Or just try and fail in a clearer way? Back in the day it wan't really possible to rely on file names being UTF8 on unix, but these days we probably can. The other reason we've historically been conservative on hackage is that we want to try and keep maximum compatibility for tools on all platforms and preserving stuff for long time periods, so sticking closely to Tar posix standards seems wise here. And yes, I'd of course like whatever we decide to go upstream (can of course keep temporary workarounds). |
Let's implement the latter for now, which I think is much easier than the former. |
I'm fully in favor of just failing more clearly. If not too complicated, I'd also recommend adding support now for UTF8 aware unpack in case the situation changes in the future, but that's not high priority IMO. |
Ok, thanks for feedback folks. I concur, lets fail better now, and I'll put it on my TODO to support UTF8 filenames in the tar package, but won't treat it as an "omg! emergency!" level of urgency. |
Perhaps it would be wise to additionally tighten up the non-ASCII character detection in |
True could add an ASCII check to the optional portability checks in the tar lib. |
Moving this to 1.24.0.2. |
Remilestoned. |
This was a dupe of #2558 which is resolved now (thanks to upstream). |
This bug is really based in the tar package, and affects both Stack (commercialhaskell/stack#2549 and commercialhaskell/stack#2557) and cabal-install. Repro steps for cabal-install:
git clone https://github.com/snoyberg/yaml
cd yaml
git checkout yaml/0.8.18.2
cabal sdist
tar xfv dist/yaml-0.8.18.2.tar.gz
Expected: Find a file with the name
yaml-0.8.18.2/test/resources/accenté/foo.yaml
Actual: The file is named
yaml-0.8.18.2/test/resources/accente^A/foo.yaml
Note that Hackage itself rejects non-ASCII filenames, so this is almost a moot point. However, if
cabal sdist
properly encodes the file path, Hackage will reject the upload with a useful error message, instead of silently accepting the upload (which happened to me). There may also be an argument to be made that Hackage's filepath checking code should be made a bit more strict I'm not sure.The text was updated successfully, but these errors were encountered: