Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix attempt for unicode bug #1137 #1772

Closed
wants to merge 5 commits into from
Closed

Conversation

remram44
Copy link
Contributor

New pull request because Github just doesn't get rebases. Original: #1179 Issue: #1137

2013-08-31, remram44:

This should probably be made to work on Python 3 first... I see nothing in six.py to help, though.

Still need tweaking to work with Python 3.

This won't work on Python 3, and is probably not required there at all.

It makes sure that all entries from complete_log are bytestrings when
join()ing, before writing to the file.

The issue (github pypa#1137) was that a single unicode object on the list
would cause all the bytestrings to be converted to unicode, probably
through the 'ascii' encoder (which would then fail on 8-bit characters).
@remram44
Copy link
Contributor Author

The way PIP is setup, the log file expects bytestrings on Python 2 (which is why I added to_utf8()) and unicode on Python 3. Make up your mind people.

@pfmoore
Copy link
Member

pfmoore commented Apr 28, 2014

There is no unicode builtin in Python 3, so this PR needs work to be applicable on Python 3 before it can be merged.

@dstufft
Copy link
Member

dstufft commented May 3, 2014

Two things:

  • Can we have a better name than to_utf8, it only actually does that on Python 2.x, maybe to_native_str?
  • Can you write a test that exercises to_utf8 on it's own, and then another test that fails without this change to the Command object, but passes with the change?

@remram44
Copy link
Contributor Author

remram44 commented May 3, 2014

Surely you can change the function name yourself...

I filed this patch 9 months ago for something that I think is a very serious bug. The fact that it prevents the log from reporting anything useful afterwards make it all the more important. Merge it if you want, but I personally don't have time to update your tests (any more).

@dstufft
Copy link
Member

dstufft commented May 3, 2014

Sure it's not a nice bug and it'd be good to have it fixed. There are a number of bugs like that and limited time, and while you may have filed the original patch 9 months ago, it didn't work on Python 3.x until 5 days ago. There is a long laundry list of things to do and this patch isn't merge ready as it is. A maintainer (myself included) may get around to it but we're all pretty busy and this isn't hardly the only bug.

@remram44 remram44 changed the title Fix attempt for unicode bug #1137 **PYTHON 2 ONLY** (rebase of #1179) Fix attempt for unicode bug #1137 (rebase of #1179) Jun 18, 2014
@remram44 remram44 changed the title Fix attempt for unicode bug #1137 (rebase of #1179) Fix attempt for unicode bug #1137 Jun 18, 2014
@remram44
Copy link
Contributor Author

Tests without the patch (failure on Python 2, that's the point): https://travis-ci.org/pypa/pip/builds/32825248

@remram44
Copy link
Contributor Author

Two things:

There were 3 things ;-)

Can we have a better name than to_utf8, it only actually does that on Python 2.x, maybe to_native_str?

1bede5b renames to to_native_str_type()

Can you write a test that exercises to_utf8 on it's own,

added with 93d4d6b

and then another test that fails without this change to the Command object, but passes with the change?

added with d017938

@Ivoz
Copy link
Contributor

Ivoz commented Aug 22, 2014

Compatibility code like to_native_str_type() should really live in pip/compat.py. You can already see a native_str(s) function exists, although it could be detable whether it is complete or not.

I also remember this commit from @dholth attempting to fix somewhat the same issue, can you report it did not (or did not for all cases)?

@remram44
Copy link
Contributor Author

Looking at that patch, the basecommand.py part only affects Python 3 (sys.version_info[0] != 2) where the bug is unlikely to happen (because you shouldn't run into bytestring literals). At that revision, my acceptance test still fails without my fix.

Indeed that function native_str() looks awfully familiar (i.e. it is the same), I can probably rebase my fix to use it instead (and move my unittest to the right place).

But if I rebase, this is the last time. This is a one-year-old fix for a very serious bug that I am tired of dragging around every time you decide a name isn't right or the stars aren't aligned. Not that it matters though, since 1.6 probably won't happen this year... I'll probably do that over the week-end.

@remram44
Copy link
Contributor Author

remram44 commented Sep 2, 2014

Hmm, took a bit more than the week-end. Anyway, submitted #2012.

@remram44 remram44 closed this Sep 2, 2014
@Ivoz
Copy link
Contributor

Ivoz commented Sep 2, 2014

But if I rebase, this is the last time. This is a one-year-old fix for a very serious bug that I am tired of dragging around every time you decide a name isn't right or the stars aren't aligned. Not that it matters though, since 1.6 probably won't happen this year... I'll probably do that over the week-end.

Didn't get to saying it earlier @remram44 , but wanted to note that yes we are always appreciative of bugfixes! Unfortunately the encoding problems have been a very hard bugs to fix, having things mostly work with python 2 pip code, then more and more cases pop up with people having unicode somethings, and it breaking sporadically, but then also having to deal with making sure any fix works for both python 3 and python 2 (not easy task with encoding problems), and is of good quality, and hopefully has regression tests to make sure it actually fixes the existing failing use cases. There's been a number of attempts and not all of them up to scratch, but we want to make sure we try and get one in that actually does the job, along with the code quality standard that is demanded of all patches. That's essentially why we are very careful to merge in these fixes, even though we're appreciative of them coming from the community.

@remram44 remram44 deleted the unicode-log branch January 14, 2015 22:11
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants