Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfFileReader barfs on unicode _namedDest ... needs to check isinstance(dest, basestring) #79

Closed
egbutter opened this issue Mar 12, 2014 · 7 comments

Comments

@egbutter
Copy link
Contributor

I noticed the way you are keeping 2to3 compatability with isinstance(x, basestring) in pagerange.py using this Str indirection. I propose we move this Str to utils, and import it into pagerange.py and pdf.py. Will submit a pull request later tonight.

@egbutter
Copy link
Contributor Author

Just realized that using Str is not the right way to do this. Either way, the dest variable in pdf:_buildOutline needs to check for unicode type, too. Im just going to check for str or unicode dest and submit the pull request.

egbutter added a commit to egbutter/PyPDF2 that referenced this issue Mar 12, 2014
…ing if dest isan instance of str or unicode now .. fixes py-pdf#79
@switham
Copy link
Contributor

switham commented Mar 12, 2014

I wrote Str, I think it's the right solution, and I think your commit 95a905a is a bug. I will explain why carefully. If you think Str isn't the right thing here, please explain carefully.

Str = getattr(__builtins__, "basestring", str)

In Python 2, this sets Str = basestring. In Python 3, there is no "basestring," and this sets Str = str. It's a little strange to use getattr, but it prevents 2to3 from converting the word "basestring" to "str."

In Python 2, isinstance(x, Str) checks whether x is a basestring, i.e. str (ASCII in Python 2) or unicode.
In Python 3, isinstance(x, Str) checks whether x is a str. str means unicode in Python 3, it's the only string type in Python 3.

In Python 3, the following will raise an error if dest is not a str, because then (and only then) it will go on to the unicode test, and "unicode" is not defined in Python 3.

          elif (isinstance(dest, str) or isinstance(dest, unicode)) \
                   and dest in self._namedDests:

Did you hit a bug using Str? Did you test the isinstance(dest, unicode) part of this statement in Python 3? To test this you would need to make dest a non-string, like an int.

I like the idea of putting Str in utils and importing it where it's needed. I'm curious why you changed your mind.

--Steve

@egbutter
Copy link
Contributor Author

Ha, definitely deny my last commit, then. This is what I assumed at first.

I changed my mind b/c running hasattr(builtins, 'basestring') returns
False(!) on my Py2.7.3 build:

'2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]'

[image: Inline image 1]

On Wed, Mar 12, 2014 at 6:51 PM, switham notifications@github.com wrote:

I wrote Str, I think it's the right solution, and I think your commit
95a905a 95a905a is a bug. I
will explain why carefully. If you think Str isn't the right thing here,
please explain carefully.

Str = getattr(builtins, "basestring", str)

In Python 2, this sets Str = basestring. In Python 3, there is no
"basestring," and this sets Str = str. It's a little strange to use
getattr, but it prevents 2to3 from converting the word "basestring" to
"str."

In Python 2, isinstance(x, Str) checks whether x is a basestring, i.e. str
(ASCII in Python 2) or unicode.
In Python 3, isinstance(x, Str) checks whether x is a str. str means
unicode in Python 3, it's the only string type in Python 3.

In Python 3, the following will raise an error if dest is not a str,
because then (and only then) it will go on to the unicode test, and
"unicode" is not defined in Python 3.

      elif (isinstance(dest, str) or isinstance(dest, unicode)) \
               and dest in self._namedDests:

Did you hit a bug using Str? Did you test the isinstance(dest, unicode)part of this statement in Python 3? To test this you would need to make
dest a non-string, like an int.

I like the idea of putting Str in utils and importing it where it's
needed. I'm curious why you changed your mind.

--Steve


Reply to this email directly or view it on GitHubhttps://github.com//issues/79#issuecomment-37479140
.

Eric
650-741-5371

@egbutter
Copy link
Contributor Author

Ok, it looks like the "safest" way to fix this is to check for basestring
in builtin rather than builtins. Apparently builtins is an
implementation detail that is not really supposed to be accessed directly.
Link http://docs.python.org/2/library/__builtin__.html#module-__builtin__.
I will make this Py3 compatable by importing builtins, the Py3 equivalent,
if I cannot import builtin. Ty,

On Thu, Mar 13, 2014 at 8:55 AM, Eric Butter emilio@ericbeurre.com wrote:

Ha, definitely deny my last commit, then. This is what I assumed at
first.

I changed my mind b/c running hasattr(builtins, 'basestring') returns
False(!) on my Py2.7.3 build:

'2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]'

[image: Inline image 1]

On Wed, Mar 12, 2014 at 6:51 PM, switham notifications@github.com wrote:

I wrote Str, I think it's the right solution, and I think your commit
95a905a 95a905a is a bug. I
will explain why carefully. If you think Str isn't the right thing here,
please explain carefully.

Str = getattr(builtins, "basestring", str)

In Python 2, this sets Str = basestring. In Python 3, there is no
"basestring," and this sets Str = str. It's a little strange to use
getattr, but it prevents 2to3 from converting the word "basestring" to
"str."

In Python 2, isinstance(x, Str) checks whether x is a basestring, i.e.
str (ASCII in Python 2) or unicode.
In Python 3, isinstance(x, Str) checks whether x is a str. str means
unicode in Python 3, it's the only string type in Python 3.

In Python 3, the following will raise an error if dest is not a str,
because then (and only then) it will go on to the unicode test, and
"unicode" is not defined in Python 3.

      elif (isinstance(dest, str) or isinstance(dest, unicode)) \
               and dest in self._namedDests:

Did you hit a bug using Str? Did you test the isinstance(dest, unicode)part of this statement in Python 3? To test this you would need to make
dest a non-string, like an int.

I like the idea of putting Str in utils and importing it where it's
needed. I'm curious why you changed your mind.

--Steve


Reply to this email directly or view it on GitHubhttps://github.com//issues/79#issuecomment-37479140
.

Eric
650-741-5371

Eric
650-741-5371

egbutter added a commit to egbutter/PyPDF2 that referenced this issue Mar 13, 2014
… Py3 by importing the main __builtin__ rather than relying on the locally-modifiable __builtins__ dict. fixes py-pdf#79
@switham
Copy link
Contributor

switham commented Mar 15, 2014

Wow. Sorry I missed the unbearable lightness of builtins. Thanks for getting it right, Eric.
--Steve

@egbutter
Copy link
Contributor Author

Haha it was a surprise to me too. And +1 for the terrific literary
reference!

On Fri, Mar 14, 2014 at 10:34 PM, switham notifications@github.com wrote:

Wow. Sorry I missed the unbearable lightness of builtins. Thanks for
getting it right, Eric.
--Steve


Reply to this email directly or view it on GitHubhttps://github.com//issues/79#issuecomment-37714277
.

Eric
650-741-5371

@mstamy2
Copy link
Collaborator

mstamy2 commented Mar 17, 2014

I've been away for a while, so I haven't had a chance to look at this much, but it looks good! Utils is the right place to handle compatibility issues, so thanks for implementing Str there.

Keep in mind there is also utils.string_type to handle string type differences between Python 2 and 3 (I'm not sure if that could have been used in this patch or not).

I'll go ahead and merge this patch if you feel it's complete

sdpython pushed a commit to sdpython/PyPDF2 that referenced this issue Jun 23, 2014
…t; moving Str basestring indirection from pagerange to utils, importing into both pdf and pagerange
sdpython pushed a commit to sdpython/PyPDF2 that referenced this issue Jun 23, 2014
… Py3 by importing the main __builtin__ rather than relying on the locally-modifiable __builtins__ dict. fixes py-pdf#79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants