Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing packages fails if Python 3 installed into path with non-ASCII characters #4984

Closed
pekkaklarck opened this issue Jan 22, 2018 · 13 comments
Labels
C: encoding Related to text encoding and likely, UnicodeErrors C: tests Testing and related things

Comments

@pekkaklarck
Copy link
Contributor

pekkaklarck commented Jan 22, 2018

  • Pip version: 9.0.1 (installed with Python 3.6.4)
  • Python version: 3.6.4
  • Operating system: Windows 7 and 10

Description:

Organizing training where using Python 3 for the first time. Two participants for failed to use pip for anything due to UnicodeDecodeError. After a little debugging it turned out this was due to to them having non-ASCII characters in their user name (not uncommon here in Finland) and Python 3.6 being installed under their account into a path like C:\Users\Käyttäjä\AppData\.... This was the default location offered by the Python installer.

Both users were using Windows 10, but I was able to reproduce this with my Window 7 virtual machine as well. Account with non-ASCII characters isn't needed, it's enough to install Python into any path with non-ASCII characters. A workaround was uninstalling Python and installing it directly under C:\.

What I've run:

C:\Users\peke>py -m pip --version
pip 9.0.1 from C:\Python36\lib\site-packages (python 3.6)
C:\Users\peke>py -m pip install robotframework
Collecting robotframework
  Using cached robotframework-3.0.2.tar.gz
Installing collected packages: robotframework
  Running setup.py install for robotframework ... error
Exception:
Traceback (most recent call last):
  File "C:\Users\peke\äää\Python36\lib\site-packages\pip\compat\__init__.py", li
ne 73, in console_to_str
    return s.decode(sys.__stdout__.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 23: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\peke\äää\Python36\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "C:\Users\peke\äää\Python36\lib\site-packages\pip\commands\install.py", line 342, in run
    prefix=options.prefix_path,
  File "C:\Users\peke\äää\Python36\lib\site-packages\pip\req\req_set.py", line 784, in install
    **kwargs
  File "C:\Users\peke\äää\Python36\lib\site-packages\pip\req\req_install.py", line 878, in install
    spinner=spinner,
  File "C:\Users\peke\äää\Python36\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess
    line = console_to_str(proc.stdout.readline())
  File "C:\Users\peke\äää\Python36\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str
    return s.decode('utf_8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 23: invalid continuation byte
@pfmoore
Copy link
Member

pfmoore commented Jan 22, 2018

Thanks for the report - this is a known issue, fixed in the development version, and the fix will be released in pip 10.

@pekkaklarck
Copy link
Contributor Author

Great! Encountering this in the first training where I used Python 3 was a flashback from #3463 that my earlier students encountered. Apparently non-ASCII stuff is still hard also in the shiny Python 3 world. =)

@pfmoore
Copy link
Member

pfmoore commented Jan 22, 2018

Ah yes, I thought your name seemed familiar! Yes, it's tricky digging out all the places where stuff like this goes wrong (and in this case, we're dealing with MS tools that produce mojibake, so there is no right answer for us, best we can do is try to not make the problem worse :-()

We're working on getting a pip 10 release out soon, because we're getting a lot more encoding-related issue reports (probably because Python 3.5/3.6 switched to installing in %APPDATA% by default, so non-ASCII in usernames is more of an issue) but we need to get some other in-progress changes sorted first. Hopefully it won't be too long though.

@pradyunsg pradyunsg added type: bug A confirmed bug or unintended behavior C: encoding Related to text encoding and likely, UnicodeErrors resolution: duplicate Duplicate of an existing issue/PR and removed type: bug A confirmed bug or unintended behavior labels Jan 23, 2018
@pekkaklarck
Copy link
Contributor Author

I can imagine it's hard to get this right everywhere and testing everything is hard too. Perhaps you should change all your test environments to have users like "Päivi". =)

Hopefully pip 10 makes it to the next Python 3.5.x and 3.6.x releases. The nature of this bug is that it's impossible to upgrade pip even if a version with a fix is released.

I see this was marked duplicate but no other issue was referred. Is there another issue about the same bug somewhere? Would be interesting to see more information about the root cause and the fix.

@pfmoore
Copy link
Member

pfmoore commented Jan 23, 2018

Sorry - I hadn't hunted down the PR that fixed this. It's #4486.

Ideally, I'd like our Appveyor tests to run with a non-ASCII username, or possibly a non-latin-1 encoding, but I've no idea how we'd do that.

@benoit-pierre
Copy link
Member

This can probably be done with a dedicated test using a virtualenv and doing what isolate does.

@pekkaklarck
Copy link
Contributor Author

Thanks for the pointers. I noticed this was fixed already in May 2017, so apparently there have been delays with pip 10. If it's still going to be delayed and may miss next Python 3 releases, could you consider backporting fixes to pip 9.x so that it could be included instead? This is a rather severe bug for us with non-ASCII characters.

@pradyunsg pradyunsg added the S: needs triage Issues/PRs that need to be triaged label May 11, 2018
@xavfernandez xavfernandez added the C: tests Testing and related things label Jul 17, 2018
@pekkaklarck
Copy link
Contributor Author

Has this issue been fixed and the fix released?

@pradyunsg pradyunsg removed the S: needs triage Issues/PRs that need to be triaged label Jan 1, 2019
@pradyunsg
Copy link
Member

The fix has been released. :)

The code needs a test though, which is what this issue is tracking.

@pradyunsg pradyunsg added good first issue A good item for first time contributors to work on and removed resolution: duplicate Duplicate of an existing issue/PR labels Jan 1, 2019
@pradyunsg
Copy link
Member

This issue is a good starting point for anyone who wants to help out with pip's development -- it's simple and the process of fixing this should be a good introduction to pip's development workflow. See the discussion above to understand what the desired fix is.

Feel free to give me a mention using "@pradyunsg" if you have any questions. :)

@stalkerbear
Copy link

This issue is a good starting point for anyone who wants to help out with pip's development -- it's simple and the process of fixing this should be a good introduction to pip's development workflow. See the discussion above to understand what the desired fix is.
Have to tell you guys, when I started with a problem it looked very easy.
Then I switched to 3.5, and created directory with hebrew letters in name of directory - couldn't install python to that directory. It came to problem in installing setuptools :
Processing setuptools-40.8.0-py3.5.egg

warning: no previously-included files found matching 'pyproject.toml'
warning: build_py: byte-compiling is disabled, skipping.

warning: install_lib: byte-compiling is disabled, skipping.

Traceback (most recent call last):
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setup.py", line 195, in
dist = setuptools.setup(**setup_params)
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setuptools_init_.py", line 145, in setup
return distutils.core.setup(**attrs)
File "C:\Python\Python35\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "C:\Python\Python35\lib\distutils\dist.py", line 955, in run_commands
self.run_command(cmd)
File "C:\Python\Python35\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setuptools\command\install.py", line 67, in run
self.do_egg_install()
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setuptools\command\install.py", line 117, in do_egg_install
cmd.run()
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setuptools\command\easy_install.py", line 418, in run
self.easy_install(spec, not self.no_deps)
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setuptools\command\easy_install.py", line 660, in easy_install
return self.install_item(None, spec, tmpdir, deps, True)
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setuptools\command\easy_install.py", line 705, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setuptools\command\easy_install.py", line 851, in install_eggs
return [self.install_egg(dist_filename, tmpdir)]
File "C:\Users\stalk\AppData\Local\Temp\tmp0jut7bclpycharm-management\setuptools-40.8.0\setuptools\command\easy_install.py", line 940, in install_egg
os.path.dirname(destination)
File "C:\Python\Python35\lib\distutils\cmd.py", line 336, in execute
util.execute(func, args, msg, dry_run=self.dry_run)
File "C:\Python\Python35\lib\distutils\util.py", line 299, in execute
log.info(msg)
File "C:\Python\Python35\lib\distutils\log.py", line 48, in info
self._log(INFO, msg, args)
File "C:\Python\Python35\lib\distutils\log.py", line 38, in log
stream.write('%s\n' % msg)
File "C:\Python\Python35\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 46-49: character maps to
WHICH in turn turned out to be windows 10 issue with command line... In python 3.6 / 3.7 it defaults to UTF-8, in 2.7 & 3.5 - to CP437 (through the import sys, sys.stdout.encoding).
After fixing that file, I could install stuff - but that was problem outside of PIP scope.
Then it came to problem in inherited ssl
.py file of integrated urllib3 directory.
There is fix for that, by inforcing utf8 in urllib3.

@pradyunsg - The question is, why should we try to work with prefered codepage instead of enforcing utf-8? Is there any problem with enforcing global utf-8 when using pip?

@brainwane brainwane removed the good first issue A good item for first time contributors to work on label Jan 27, 2020
@pradyunsg pradyunsg added state: needs discussion This needs some more discussion state: needs reproducer Need to reproduce issue labels Jan 28, 2020
@pradyunsg pradyunsg removed state: needs discussion This needs some more discussion state: needs reproducer Need to reproduce issue labels Apr 29, 2022
@pradyunsg
Copy link
Member

Closing this since I don't think adding a test for this is particularly valuable anymore. If someone else feels otherwise, please feel free to reopen.

@pekkaklarck
Copy link
Contributor Author

Based on #9054, having a test for this would be a good idea. Probably would be enough if Windows CI machines had users with non-ASCII user names. Anyway, #9054 is still open so closing this is fine.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: encoding Related to text encoding and likely, UnicodeErrors C: tests Testing and related things
Projects
None yet
Development

No branches or pull requests

7 participants