You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What should this do? The documentation for BinaryPipes says this:
Treat the stdin/stdout/stderr messages as binary. By default String results use text encoding and ByteString results use binary encoding.
This suggests the str result should still be properly decoded the same way it would be decoded if the bs result weren’t there. However, it is not: str is merely Data.ByteString.Char8.unpack bs, so non-ASCII characters get garbled. The bug is here:
DestString x -> addBuffer x . (if isWindows then replace "\r\n""\n"elseid) .BS.unpack
DestBytes x -> addBuffer x
Note the call to BS.unpack on line 185. That’s wrong!
This is not just a theoretical issue—I ran into it just now while working on GHC, as Hadrian sometimes ends up generating a use of Stdout like this (albeit less directly).
The text was updated successfully, but these errors were encountered:
You only have (and only want) one pipe, and if the user has specified Stdout as a ByteString, you have to set it to a binary pipe to get that. But if the user asks for both binary and String, what should we do? We don't know the encoding of the pipe - it might be UTF8, it might be Latin1, so at this point, pretty much everything is wrong. The advantage of BS.unpack is that it's wrong but total, which is better than wrong and partial - but agreed, it's far from desirable.
At least that way my conclusion - if you can think of a better way to solve this, I'd love to hear.
The advantage of BS.unpack is that it's wrong but total, which is better than wrong and partial - but agreed, it's far from desirable.
I think it would be better to be correct and partial, or maybe more accurately, partial and not-wrong. That is, combining these two things could raise an error rather than silently producing nonsense.
A more involved possibility would be to manually decode the bytes. It’s possible to acquire the encoding of the pipe by calling hGetEncoding before the call to hSetBinaryMode, and one can obtain a decoder from the resulting TextEncoding value using mkTextDecoder. However, applying this to a ByteString is not entirely straightforward, so it would be more work.
Suppose we write the following:
What should this do? The documentation for
BinaryPipes
says this:This suggests the
str
result should still be properly decoded the same way it would be decoded if thebs
result weren’t there. However, it is not:str
is merelyData.ByteString.Char8.unpack bs
, so non-ASCII characters get garbled. The bug is here:shake/src/General/Process.hs
Lines 180 to 186 in d92b00a
Note the call to
BS.unpack
on line 185. That’s wrong!This is not just a theoretical issue—I ran into it just now while working on GHC, as Hadrian sometimes ends up generating a use of
Stdout
like this (albeit less directly).The text was updated successfully, but these errors were encountered: