Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several possible problems within the ZPublisher Converters module #558

Closed
jugmac00 opened this issue Apr 18, 2019 · 22 comments · Fixed by #963
Closed

Several possible problems within the ZPublisher Converters module #558

jugmac00 opened this issue Apr 18, 2019 · 22 comments · Fixed by #963

Comments

@jugmac00
Copy link
Member

When working on #557 I spotted several possible problems within the Converters module.

The converters which inherit from _unicode_converter all fail under Python 3 and byte input.

Testcase:

def test_field2utext_with_binary_string(self):
    from ZPublisher.Converters import field2utext
    to_convert = b'abc'
    if PY2:
        expected = u'abc'
        self.assertEqual(field2utext(to_convert), expected)
    if PY3:
        expected = 'abc'
        self.assertEqual(field2utext(to_convert), expected)

Failure in test test_field2utext_with_binary_string (ZPublisher.tests.test_Converters.ConvertersTests)
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib64/python3.6/unittest/case.py", line 622, in run
    testMethod()
  File "/home/jugmac00/Projects/Zope/Zope/src/ZPublisher/tests/test_Converters.py", line 259, in test_field2utext_with_binary_string
    self.assertEqual(field2utext(to_convert), expected)
  File "/usr/lib64/python3.6/unittest/case.py", line 846, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/usr/lib64/python3.6/unittest/case.py", line 1220, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "/usr/lib64/python3.6/unittest/case.py", line 687, in fail
    raise self.failureException(msg)
AssertionError: "b'abc'" != 'abc'
- b'abc'
+ abc

This problem applies to field2utokens, field2utext, field2string. field2ulines on the other hand does not inherit from _unicode_converter and has to be examined separately.

I spotted

v = text_type(v)
as the root problem for this behaviour, at least with Python 3.


Another possible problem is in

class field2utext(_unicode_converter):
- as line breaks only get normalized when there is input of type byte.

I tried to fix these "problems" yesterday - seemingly successfully, but then suddenly

def test_processInputs_w_unicode_conversions(self):
failed for the utext case.


Also, can anybody shed some light on what this "read" attribute is all about?
e.g.

if hasattr(v, 'read'):

Before spending more time on these issue, I'd like somebody to confirm that these are real issues.

@leorochael
Copy link
Contributor

Also, can anybody shed some light on what this "read" attribute is all about?
e.g.

if hasattr(v, 'read'):

IIIUC, post containing file upload inputs are interpreted as file-like objects in the request.

This code seems to imply that you can apply the ulines converter to the content of a file upload (as opposed to, say, a textarea).

@dataflake
Copy link
Member

As Leo said, what comes into the converter may be a file-like object, and to get to the content read is called.

v = text_type(v)

This is really lame. This translates to unicode(v) in Python 2 and str(v) in Python 3. I would try using the six.ensure_text here (https://six.readthedocs.io/#six.ensure_text) and passing in the default_encoding set at module scope in that Converters module. That's a much better starting point for any further work.

About that test_processInputs_w_unicode_conversions test method, obviously it's been testing what was there before you started your work. Any changes yo make may break it if it relies on broken behavior. There's a very high likelihood that it's an issue with the test, not with correct functionality.

@dataflake
Copy link
Member

P.S.: This blocking issue may be related: #271

@jugmac00
Copy link
Member Author

jugmac00 commented Apr 19, 2019

Some more thoughts...

Both

def field2string(v):
"""Converts value to native strings.
So always to `str` no matter which Python version you are on.
"""
if hasattr(v, 'read'):
return v.read()

and
def field2bytes(v):
# Converts value to bytes.
if hasattr(v, 'read'):
return v.read()

return v.read() for file-like objects, without any conversion steps.

I have to admit I lack some knowledge in this area (which I would like to improve working on this issue), but I cannot imagine this can be true.

Also, I generally find the code in this module a bit confusing. Ie the partially use of a converter class, the many duplications, and that some "public" functions call each other instead of reusing some common functionality.

I'd probably write a couple of decorators, like

  • ensure_native_string
  • accept_file_like_input / process_file_like_input
  • accept_container_sequences / process_container_sequences (ie list and tuple)
    ...

Any thoughts on this?

@jugmac00
Copy link
Member Author

Also.. how does this package and all the duplicated functions relate to the ZPublisher?

https://github.com/zopefoundation/zope.publisher/blob/0cbf451433ac86e433390e319ff0ce5c33f3f556/src/zope/publisher/browser.py#L77-L153

@d-maurer
Copy link
Contributor

d-maurer commented Apr 19, 2019 via email

@dataflake
Copy link
Member

Also.. how does this package and all the duplicated functions relate to the ZPublisher?

https://github.com/zopefoundation/zope.publisher/blob/0cbf451433ac86e433390e319ff0ce5c33f3f556/src/zope/publisher/browser.py#L77-L153

Unless you have proof that the tests in Zope itself exercise the duplicate functions in that package just disregard them. In that case you can look to see if they implement anything "better" and copy that into the ZPublisher. Otherwise, don't worry about it.

@dataflake
Copy link
Member

Both
Zope/src/ZPublisher/Converters.py

Lines 33 to 39 in bdc1d21

def field2string(v):
"""Converts value to native strings.

 So always to `str` no matter which Python version you are on. 
 """ 
 if hasattr(v, 'read'): 
     return v.read() 

and
Zope/src/ZPublisher/Converters.py

Lines 48 to 51 in bdc1d21

def field2bytes(v):
# Converts value to bytes.
if hasattr(v, 'read'):
return v.read()

return v.read() for file-like objects, without any conversion steps.

When you read from file objects you're getting bytes back. Knowing that you can easily see that field2string is doing it wrong, because that will return bytes.

Don't think too fancy before fixing what's obviously wrong and then changing the tests that tested the obviously wrong behavior. You're approaching analysis paralysis it seems.

@jugmac00
Copy link
Member Author

When you read from file objects you're getting bytes back. Knowing that you can easily see that field2string is doing it wrong, because that will return bytes.

Python 2 repl

>>> with open("text.txt") as f:
...     type(f.read())
... 
<type 'str'>

Python 3 repl

>>> with open ("text.txt") as f:
...     type(f.read())
... 
<class 'str'>

That would mean that field2string works correctly for both PY2 and PY3, and field2bytes 's file handling would be wrong for Python 3.

def field2bytes(v):
# Converts value to bytes.
if hasattr(v, 'read'):
return v.read()
elif isinstance(v, text_type):
return v.encode(default_encoding)
else:
return bytes(v)

Don't think too fancy before fixing what's obviously wrong and then changing the tests that tested the obviously wrong behavior. You're approaching analysis paralysis it seems.

As I stated above, I still have some gaps of knowledge especially with six and Python 2 vs Python 3 in terms of text vs bytes. And as this is not paid work with a hard time limit, it's a perfect opportunity to learn a thing or two.

Thanks for taking your time!

Going to prepare a first pull request for increasing field2string and field2bytes test cases and fix the bug with file-like object as input, as all the other converters rely on these two functions.

@jugmac00 jugmac00 self-assigned this Apr 19, 2019
@jugmac00 jugmac00 added this to the 4.0 final milestone Apr 19, 2019
@dataflake
Copy link
Member

Correction for my earlier comment (after seeing your manual experiments with file objects): What comes out of a file when you read depends on how it was opened, text mode or binary mode (r vs rb). You could consult the code where these file objects are created to be extra sure. Or in the converter code you could take a look at the file's mode attribute, which "normal" file objects have, to find out. This may not be a standard file object, though.

@jugmac00
Copy link
Member Author

See #560 (comment) - as @dataflake pointed out - it is still not 100% clear whether field2string or field2bytes behaviour was correct - have to look into it more deeply and probably setup a better test case with a real Fieldstorage object.

@d-maurer
Copy link
Contributor

d-maurer commented Apr 20, 2019 via email

@jugmac00
Copy link
Member Author

So, this issue is getting a bit out of hand...

I try to wrap it up.

@dataflake Your initial thoughts on where the buggy behaviour is located was right.

I stepped through a file upload with a "string" type converter attached to the field and that was the result:

python3
(Pdb) item
<ZPublisher.HTTPRequest.FileUpload object at 0x7fb79e74e588>
(Pdb) type(item.file.read())
<class 'bytes'>

python2
(Pdb) item
<ZPublisher.HTTPRequest.FileUpload object at 0x7f907c06b7d0>
(Pdb) type(item.read())
<type 'str'>

I closed my pr #560 which would have introduced a new bug. Thanks @dataflake !

Having spent now quite some hours in the Converters and HTTPRequest.processInputs code, I have to concur with @d-maurer - a converter should not be applied to file uploads - and even should not be aware if the input is a file. hasattr looks a like a code smell to me.

When you take a look at the "type_converters"...

type_converters = {
'float': field2float,
'int': field2int,
'long': field2long,
'string': field2string, # to native str
'bytes': field2bytes,
'date': field2date,
'date_international': field2date_international,
'required': field2required,
'tokens': field2tokens,
'lines': field2lines,
'text': field2text,
'boolean': field2boolean,
'ustring': field2ustring,
'utokens': field2utokens,
'ulines': field2ulines,
'utext': field2utext,
}

Imho only one should be applicable to file uploads - field2required - which is not a type converter at all. Imho it should be pulled out of the converter module and directly applied in HTTPRequest.processInputs - similar to "ignore_empty".

elif type_name == 'ignore_empty':

As these changes would be breaking changes, I leave the decision to more senior contributors as @dataflake / @icemac

As a side note:
When trying to setup a test case without a fake but with a real FieldStorage/FileUpload, I noticed the heavy coupling between the Converter module and HTTPRequest.processInputs.

  • both have to know whether the input is a file
  • processInputs checks whether the converters have a convert_unicode method

TL/DR

  • somebody has to decide what really has to be done with files and converters
  • field2u* is broken for Python 3 and byte input

@jugmac00 jugmac00 removed their assignment Apr 22, 2019
@d-maurer
Copy link
Contributor

d-maurer commented Apr 22, 2019 via email

@leorochael
Copy link
Contributor

Also, can anybody shed some light on what this "read" attribute is all about?
e.g.

if hasattr(v, 'read'):

IIIUC, for POST requests containing file upload inputs, these inputs are interpreted as file-like objects in the request.

This code seems to imply that you can apply the ulines converter to the content of a file upload (as opposed to, say, a textarea).

@jugmac00
Copy link
Member Author

jugmac00 commented May 8, 2019

@icemac / @dataflake / @jamadden / @mgedmin ...

Somebody has to make a decision on how to proceed with this issue.

TL/DR

  • somebody has to decide what really has to be done with files and converters
  • field2u* is broken for Python 3 and byte input

@dataflake
Copy link
Member

My opinion:

  • change the converters to not touch files
  • leave the field2u* converters alone. We will revisit it when we have gathered more experience with Zope on Python 3.

@d-maurer
Copy link
Contributor

d-maurer commented May 10, 2019 via email

@jugmac00
Copy link
Member Author

Is there any preference on what should happen when a converter gets applied to a file anyway?

I am no big fan of ignoring wrong usage, as I prefer to fail hard and early, ie I'd raise an exception.

@d-maurer
Copy link
Contributor

d-maurer commented May 10, 2019 via email

@dataflake
Copy link
Member

You can always use the logger to emit a warning-level log message

@jugmac00
Copy link
Member Author

At the April 2021 sprint we just decided to create a PR and discuss details later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants