-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-13153: Use OS native encoding for converting between Python and Tcl. #16545
bpo-13153: Use OS native encoding for converting between Python and Tcl. #16545
Conversation
On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the "surrogatepass" error handler for converting to/from Tcl Unicode objects. On Linux use UTF-8 with the "surrogateescape" error handler for converting to/from Tcl String objects. Converting strings from Tcl to Python and back now never fails (except MemoryError).
27e5155
to
f4db0e7
Compare
Works on Windows 10! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works better than I expected. Pasting an astral char not only does not raise and exit, but the actual char is displayed if the font + OS extension supports it, otherwise a replacement box. I see the computer emoji below.
>>> print('💻', '\U0001f4bb') # 💻 was pasted.
💻 💻
It also fixes user printing of astral chars (bpo 2274), either as the char or replacement char, and tracebacks with astral chars (bpo 36698). I presume it will fix display of file names and contents with astral chars, and will test later.
There is also a problem that I did not expect. Editing code past astral chars, on the same line, is discombobulated. For me, on Windows, the insert cursor | is displayed two chars to the left of where it should be for each astral char it follows on the same line. For instance, to change the f in '\U00|01f4bb', position the | cursor as shown, hit DEL, and the replacement. Backspace and replacement will not work correctly. Chars immediately past an astral cannot be edited at all. This is better than IDLE closing, but if, as I suspect, we cannot change this, the IDLE doc should mention that astral literals disable proper editing on the remainder of the physical line.
This does indeed seem to work very well, and solve many issues simultaneously. The remaining issue, mentioned by @terryjreedy, appears to be entirely internal to Tk. The next version of Tk is supposed to have greatly improved support for Unicode, so hopefully that would help. In the meantime, yes, let's get this in with the added warning in the docs. |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8. |
GH-16580 is a backport of this pull request to the 3.8 branch. |
GH-16581 is a backport of this pull request to the 3.7 branch. |
…cl. (pythonGH-16545) On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the "surrogatepass" error handler for converting to/from Tcl Unicode objects. On Linux use UTF-8 with the "surrogateescape" error handler for converting to/from Tcl String objects. Converting strings from Tcl to Python and back now never fails (except MemoryError). (cherry picked from commit 06cb94b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…cl. (GH-16545) On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the "surrogatepass" error handler for converting to/from Tcl Unicode objects. On Linux use UTF-8 with the "surrogateescape" error handler for converting to/from Tcl String objects. Converting strings from Tcl to Python and back now never fails (except MemoryError). (cherry picked from commit 06cb94b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…cl. (GH-16545) On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the "surrogatepass" error handler for converting to/from Tcl Unicode objects. On Linux use UTF-8 with the "surrogateescape" error handler for converting to/from Tcl String objects. Converting strings from Tcl to Python and back now never fails (except MemoryError). (cherry picked from commit 06cb94b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…cl. (pythonGH-16545) On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the "surrogatepass" error handler for converting to/from Tcl Unicode objects. On Linux use UTF-8 with the "surrogateescape" error handler for converting to/from Tcl String objects. Converting strings from Tcl to Python and back now never fails (except MemoryError).
On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the
"surrogatepass" error handler for converting to/from Tcl Unicode objects.
On Linux use UTF-8 with the "surrogateescape" error handler for converting
to/from Tcl String objects.
Converting strings from Tcl to Python and back now never fails
(except MemoryError).
https://bugs.python.org/issue13153