-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character encoding brokenness in Python REPL input (both 2.7 and 3.6) #2656
Comments
lazka
referenced
this issue
in lazka/MINGW-packages
Jul 7, 2017
…a real Windows console CPython uses isatty() to detect a terminal and change some settings like line buffering and interactive mode. Use is_cygpty() to make this also work under mintty. See https://github.com/Alexpux/MINGW-packages/issues/2645 This also removes the bash script which forced the interactive mode when python3 was started without arguments. This is no longer needed as Python now detects the terminal output and does this automatically. Also use is_cygpty() to detect when not under mintty and disable the readline module there, as using it breaks input of certain characters and leads to errors on shutdown when it tries to save the readline history. (The readline module is not available in the official Python build) See https://github.com/Alexpux/MINGW-packages/issues/2656
lazka
referenced
this issue
in lazka/MINGW-packages
Aug 14, 2017
…a real Windows console CPython uses isatty() to detect a terminal and change some settings like line buffering and interactive mode. Use is_cygpty() to make this also work under mintty. See https://github.com/Alexpux/MINGW-packages/issues/2645 This also removes the bash script which forced the interactive mode when python3 was started without arguments. This is no longer needed as Python now detects the terminal output and does this automatically. Also use is_cygpty() to detect when not under mintty and disable the readline module there, as using it breaks input of certain characters and leads to errors on shutdown when it tries to save the readline history. (The readline module is not available in the official Python build) See https://github.com/Alexpux/MINGW-packages/issues/2656
Alexpux
referenced
this issue
Aug 14, 2017
…a real Windows console (#2675) CPython uses isatty() to detect a terminal and change some settings like line buffering and interactive mode. Use is_cygpty() to make this also work under mintty. See https://github.com/Alexpux/MINGW-packages/issues/2645 This also removes the bash script which forced the interactive mode when python3 was started without arguments. This is no longer needed as Python now detects the terminal output and does this automatically. Also use is_cygpty() to detect when not under mintty and disable the readline module there, as using it breaks input of certain characters and leads to errors on shutdown when it tries to save the readline history. (The readline module is not available in the official Python build) See https://github.com/Alexpux/MINGW-packages/issues/2656
Python 2 and 3 now disable readline in case no cygwin terminal is detected. Maybe it should be disabled altogether, but I tried to keep the change minimal for now. winpty + python work for me now at least. |
mingw python no longer uses readline when run in a normal terminal (cmd/winpty). So I assume the core issue here is fixed. If there is something missing please say so. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For the scenarios below, use an ordinary console window (not mintty, winpty, ConEmu, etc). Set the active code page to 437 and use a TrueType font, like Lucida Console or Consolas. (Selecting a Raster font can prevent the console from showing many characters.)
Scenario 1: Python 3.6.2rc1 in an ordinary console
NB:
sys.stdin.encoding
isutf-8
.Start Python:
Select an English-layout keyboard, then copy and paste
len('ö')
into the console. Theö
character is dropped:Switch the keyboard to German, then try again. There is an error:
Scenario 2: Python 2.7.13 in an ordinary console
NB:
sys.stdin.encoding
iscp437
.Start Python:
Select an English-layout keyboard, then copy-and-paste
len('ö')
into the console. Theö
is dropped, as with Python 3.Select a German-layout keyboard, then copy-and-paste
len('ö')
into the console. Something (GNU Readline?) converts theö
into\224
. The\224
is treated as a single unit for typing purposes. (e.g. one backspace removes the whole thing.):Copy and paste
u'ö' ; u'\224'
into the console. The output is visually inconsistent. Theu'ö'
-become-u'\224'
identifies U+00F6, but the ASCIIu'\224'
becomes U+0094 (NB: 0o224 == 0x94):Copy and paste
ö
into the REPL. Nothing appears. (This particular oddity does not affect Python 3.6.2rc1.) Try typing aö
into the REPL (use the on-screen keyboard if you must), and nothing appears.Scenario 3: Python 3.6.2rc1 from mintty
NB:
sys.stdin.encoding
iscp1252
."Native" console programs tend not to work with a Cygwin pty. Common advice is to use winpty, which lets the program use console I/O instead.
Start Python:
Select either an English or German keyboard. I don't think it matters.
Copy and paste
x = '…' ; " ".join("%x" % ord(c) for c in x)
. The result:mintty turns
…
[U+2026] into its UTF-8 representation:E2 80 A6
. Each of the three bytes is interpreted using Windows-1252, which produces U+00E2 U+20AC U+00A6.I'm mostly just including this scenario for reference. It doesn't work with the official Python releases either, and I don't think it's expected to work. How would Python know that its pipes are going to be encoded as UTF-8, rather than Windows-1252? Maybe it could work, somehow? I saw some discussion somewhere recently about detecting named pipes that are really Cygwin ptys, and that might tell us something about its encoding.
Scenario 4: Use
sys.stdin.readline()
instead:Start either MinGW Python 2 or 3, with any keyboard.
Copy and paste:
import sys ; " ".join("%x" % ord(c) for c in sys.stdin.readline())
. On the next line, paste,ö…
. The result is'94 2e a'
on Python 2 and'f6 2026 a'
on Python 3. Both of these are correct, indicating that the issue is with the REPL's line entry, not with general Python stdin reading.The first two scenarios work fine with official Python releases of 2.7.11 and 3.6.1. The third and fourth scenarios behave the same way with the official release. FWIW, I'm pretty sure the official Python releases do not use GNU Readline:
readline
module I can import from the REPL; the official release doesn't.See this issue, rprichard/winpty#121. I suspect the problem is really with GNU Readline. I dug into that library a bit. I think it's using ordinary C narrow strings, but if
isatty(0)
is true, then it uses the MSVSCRT_getch
function to read input. Based on my testing, that function always returns input in the console's code page, and it also ignores characters that aren't in the current keyboard layout. I'm guessing that a proper MinGW Readline port should use*Console*
wide APIs instead and explicit UTF-8 <-> UTF-16 conversions.I don't think MSVCRT has a proper UTF-8 locale setting: https://stackoverflow.com/questions/4324542/what-is-the-windows-equivalent-for-en-us-utf-8-locale.
The text was updated successfully, but these errors were encountered: