Several possible problems within the ZPublisher Converters module #558

jugmac00 · 2019-04-18T13:50:34Z

When working on #557 I spotted several possible problems within the Converters module.

The converters which inherit from _unicode_converter all fail under Python 3 and byte input.

Testcase:

def test_field2utext_with_binary_string(self):
    from ZPublisher.Converters import field2utext
    to_convert = b'abc'
    if PY2:
        expected = u'abc'
        self.assertEqual(field2utext(to_convert), expected)
    if PY3:
        expected = 'abc'
        self.assertEqual(field2utext(to_convert), expected)

Failure in test test_field2utext_with_binary_string (ZPublisher.tests.test_Converters.ConvertersTests)
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib64/python3.6/unittest/case.py", line 622, in run
    testMethod()
  File "/home/jugmac00/Projects/Zope/Zope/src/ZPublisher/tests/test_Converters.py", line 259, in test_field2utext_with_binary_string
    self.assertEqual(field2utext(to_convert), expected)
  File "/usr/lib64/python3.6/unittest/case.py", line 846, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/usr/lib64/python3.6/unittest/case.py", line 1220, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "/usr/lib64/python3.6/unittest/case.py", line 687, in fail
    raise self.failureException(msg)
AssertionError: "b'abc'" != 'abc'
- b'abc'
+ abc

This problem applies to field2utokens, field2utext, field2string. field2ulines on the other hand does not inherit from _unicode_converter and has to be examined separately.

I spotted

Zope/src/ZPublisher/Converters.py

Line 186 in 5733da7

v = text_type(v)

as the root problem for this behaviour, at least with Python 3.

Another possible problem is in

Zope/src/ZPublisher/Converters.py

Line 209 in 5733da7

class field2utext(_unicode_converter):

- as line breaks only get normalized when there is input of type byte.

I tried to fix these "problems" yesterday - seemingly successfully, but then suddenly

Zope/src/ZPublisher/tests/testHTTPRequest.py

Line 302 in 5733da7

def test_processInputs_w_unicode_conversions(self):

failed for the utext case.

Also, can anybody shed some light on what this "read" attribute is all about?
e.g.

Zope/src/ZPublisher/Converters.py

Line 221 in 5733da7

if hasattr(v, 'read'):

Before spending more time on these issue, I'd like somebody to confirm that these are real issues.

The text was updated successfully, but these errors were encountered:

leorochael · 2019-04-18T14:14:58Z

Also, can anybody shed some light on what this "read" attribute is all about?
e.g.

Zope/src/ZPublisher/Converters.py

Line 221 in 5733da7

if hasattr(v, 'read'):

IIIUC, post containing file upload inputs are interpreted as file-like objects in the request.

This code seems to imply that you can apply the ulines converter to the content of a file upload (as opposed to, say, a textarea).

dataflake · 2019-04-18T14:53:38Z

As Leo said, what comes into the converter may be a file-like object, and to get to the content read is called.

Zope/src/ZPublisher/Converters.py

Line 186 in 5733da7

v = text_type(v)

This is really lame. This translates to unicode(v) in Python 2 and str(v) in Python 3. I would try using the six.ensure_text here (https://six.readthedocs.io/#six.ensure_text) and passing in the default_encoding set at module scope in that Converters module. That's a much better starting point for any further work.

About that test_processInputs_w_unicode_conversions test method, obviously it's been testing what was there before you started your work. Any changes yo make may break it if it relies on broken behavior. There's a very high likelihood that it's an issue with the test, not with correct functionality.

dataflake · 2019-04-18T14:55:37Z

P.S.: This blocking issue may be related: #271

jugmac00 · 2019-04-19T08:29:52Z

Some more thoughts...

Both

Zope/src/ZPublisher/Converters.py

Lines 33 to 39 in bdc1d21

    
           def field2string(v): 
        
               """Converts value to native strings. 
        
               So always to `str` no matter which Python version you are on. 
        
               """ 
        
               if hasattr(v, 'read'): 
        
                   return v.read()

and

Zope/src/ZPublisher/Converters.py

Lines 48 to 51 in bdc1d21

    
           def field2bytes(v): 
        
               # Converts value to bytes. 
        
               if hasattr(v, 'read'): 
        
                   return v.read()

return v.read() for file-like objects, without any conversion steps.

I have to admit I lack some knowledge in this area (which I would like to improve working on this issue), but I cannot imagine this can be true.

Also, I generally find the code in this module a bit confusing. Ie the partially use of a converter class, the many duplications, and that some "public" functions call each other instead of reusing some common functionality.

I'd probably write a couple of decorators, like

ensure_native_string
accept_file_like_input / process_file_like_input
accept_container_sequences / process_container_sequences (ie list and tuple)
...

Any thoughts on this?

jugmac00 · 2019-04-19T09:35:53Z

Also.. how does this package and all the duplicated functions relate to the ZPublisher?

https://github.com/zopefoundation/zope.publisher/blob/0cbf451433ac86e433390e319ff0ce5c33f3f556/src/zope/publisher/browser.py#L77-L153

d-maurer · 2019-04-19T09:49:40Z

Jürgen Gmach wrote at 2019-4-19 02:35 -0700:

Also.. how does this package and all the duplicated functions relate to the ZPublisher?

You likely see an historical effect: The converter functions likely were first used to implement the ":<type>" suffixes of `ZPublisher` -- and have been defined there. Lateron, some of them were also used elsewhere, e.g. for `PropertyManager`. A refactoring took place - but maybe not a complete one.

dataflake · 2019-04-19T14:01:53Z

Also.. how does this package and all the duplicated functions relate to the ZPublisher?

https://github.com/zopefoundation/zope.publisher/blob/0cbf451433ac86e433390e319ff0ce5c33f3f556/src/zope/publisher/browser.py#L77-L153

Unless you have proof that the tests in Zope itself exercise the duplicate functions in that package just disregard them. In that case you can look to see if they implement anything "better" and copy that into the ZPublisher. Otherwise, don't worry about it.

dataflake · 2019-04-19T14:05:44Z

Both
Zope/src/ZPublisher/Converters.py

Lines 33 to 39 in bdc1d21

def field2string(v):
"""Converts value to native strings.
 So always to `str` no matter which Python version you are on. 
 """ 
 if hasattr(v, 'read'): 
     return v.read() 
and
Zope/src/ZPublisher/Converters.py

Lines 48 to 51 in bdc1d21

def field2bytes(v):
# Converts value to bytes.
if hasattr(v, 'read'):
return v.read()

return v.read() for file-like objects, without any conversion steps.

When you read from file objects you're getting bytes back. Knowing that you can easily see that field2string is doing it wrong, because that will return bytes.

Don't think too fancy before fixing what's obviously wrong and then changing the tests that tested the obviously wrong behavior. You're approaching analysis paralysis it seems.

jugmac00 · 2019-04-19T21:52:48Z

When you read from file objects you're getting bytes back. Knowing that you can easily see that field2string is doing it wrong, because that will return bytes.

Python 2 repl

>>> with open("text.txt") as f:
...     type(f.read())
... 
<type 'str'>

Python 3 repl

>>> with open ("text.txt") as f:
...     type(f.read())
... 
<class 'str'>

That would mean that field2string works correctly for both PY2 and PY3, and field2bytes 's file handling would be wrong for Python 3.

Zope/src/ZPublisher/Converters.py

Lines 48 to 55 in bdc1d21

    
           def field2bytes(v): 
        
               # Converts value to bytes. 
        
               if hasattr(v, 'read'): 
        
                   return v.read() 
        
               elif isinstance(v, text_type): 
        
                   return v.encode(default_encoding) 
        
               else: 
        
                   return bytes(v)

Don't think too fancy before fixing what's obviously wrong and then changing the tests that tested the obviously wrong behavior. You're approaching analysis paralysis it seems.

As I stated above, I still have some gaps of knowledge especially with six and Python 2 vs Python 3 in terms of text vs bytes. And as this is not paid work with a hard time limit, it's a perfect opportunity to learn a thing or two.

Thanks for taking your time!

Going to prepare a first pull request for increasing field2string and field2bytes test cases and fix the bug with file-like object as input, as all the other converters rely on these two functions.

dataflake · 2019-04-19T23:38:57Z

Correction for my earlier comment (after seeing your manual experiments with file objects): What comes out of a file when you read depends on how it was opened, text mode or binary mode (r vs rb). You could consult the code where these file objects are created to be extra sure. Or in the converter code you could take a look at the file's mode attribute, which "normal" file objects have, to find out. This may not be a standard file object, though.

jugmac00 · 2019-04-20T05:08:07Z

See #560 (comment) - as @dataflake pointed out - it is still not 100% clear whether field2string or field2bytes behaviour was correct - have to look into it more deeply and probably setup a better test case with a real Fieldstorage object.

d-maurer · 2019-04-20T06:50:37Z

Jürgen Gmach wrote at 2019-4-19 22:08 -0700:

See #560 (comment) - as @dataflake pointed out - it is still not 100% clear whether field2string or field2bytes behaviour was correct - have to look into it more deeply and probably setup a better test case with a real Fieldstorage object.

I have my doubts that "field2*" converters should be applied to (uploaded) files at all. If they should, the case will be non-trivial. The file's "MIME type" tells us whether its content is (likely) text ("content-type=text/...") or binary ("content-type"=<non-text>"). For text files, we would also need to know the "charset". The "content-type" might contain "charset" information, but likely it does not (as the browser likely does not have this information). Thus, it is to be expected that text file content cannot be reliably handled by the converters as they lack the required charset information (in the general case). The converters cannot rely on server configuration to determine a default charset -- as (in the general case) the charset of uploaded files may not have any relation to the server configuration. In my opinion, uploaded files should always be opened in binary mode (as important information is missing in the general case to open them reliably in text mode) and general converters should refuse to convert their content to text, unless the converter itself specifies the charset to use. An alternative would be to assume the "utf-8" charset in (general) converters and fail when the assumption obviously fails. This might be a pragmatic approach if dropping support to convert file content is considered too drastic.

jugmac00 · 2019-04-22T06:56:34Z

So, this issue is getting a bit out of hand...

I try to wrap it up.

@dataflake Your initial thoughts on where the buggy behaviour is located was right.

I stepped through a file upload with a "string" type converter attached to the field and that was the result:

python3
(Pdb) item
<ZPublisher.HTTPRequest.FileUpload object at 0x7fb79e74e588>
(Pdb) type(item.file.read())
<class 'bytes'>

python2
(Pdb) item
<ZPublisher.HTTPRequest.FileUpload object at 0x7f907c06b7d0>
(Pdb) type(item.read())
<type 'str'>

I closed my pr #560 which would have introduced a new bug. Thanks @dataflake !

Having spent now quite some hours in the Converters and HTTPRequest.processInputs code, I have to concur with @d-maurer - a converter should not be applied to file uploads - and even should not be aware if the input is a file. hasattr looks a like a code smell to me.

When you take a look at the "type_converters"...

Zope/src/ZPublisher/Converters.py

Lines 235 to 252 in a8208a7

    
           type_converters = { 
        
               'float': field2float, 
        
               'int': field2int, 
        
               'long': field2long, 
        
               'string': field2string,  # to native str 
        
               'bytes': field2bytes, 
        
               'date': field2date, 
        
               'date_international': field2date_international, 
        
               'required': field2required, 
        
               'tokens': field2tokens, 
        
               'lines': field2lines, 
        
               'text': field2text, 
        
               'boolean': field2boolean, 
        
               'ustring': field2ustring, 
        
               'utokens': field2utokens, 
        
               'ulines': field2ulines, 
        
               'utext': field2utext, 
        
           }

Imho only one should be applicable to file uploads - field2required - which is not a type converter at all. Imho it should be pulled out of the converter module and directly applied in HTTPRequest.processInputs - similar to "ignore_empty".

Zope/src/ZPublisher/HTTPRequest.py

Line 620 in a8208a7

elif type_name == 'ignore_empty':

As these changes would be breaking changes, I leave the decision to more senior contributors as @dataflake / @icemac

As a side note:
When trying to setup a test case without a fake but with a real FieldStorage/FileUpload, I noticed the heavy coupling between the Converter module and HTTPRequest.processInputs.

both have to know whether the input is a file
processInputs checks whether the converters have a convert_unicode method

TL/DR

somebody has to decide what really has to be done with files and converters
field2u* is broken for Python 3 and byte input

d-maurer · 2019-04-22T07:48:13Z

Jürgen Gmach wrote at 2019-4-21 23:56 -0700:

... - [ ] field2u* is broken for Python 3 and byte input

The "u*" things are relevant only for Python 2. With Python 3, "u*" is equivalent to "*", e.g. "ustring" to "string", "ulines" to "lines", etc. Of course, this does not solve the conversion problem. Passing a file where a simple type (such as "string") is expected is likely extremely rare. It should not be a big problem to drop support for this.

leorochael · 2019-04-22T14:18:01Z

Also, can anybody shed some light on what this "read" attribute is all about?
e.g.

Zope/src/ZPublisher/Converters.py

Line 221 in 5733da7

if hasattr(v, 'read'):

IIIUC, for POST requests containing file upload inputs, these inputs are interpreted as file-like objects in the request.

This code seems to imply that you can apply the ulines converter to the content of a file upload (as opposed to, say, a textarea).

jugmac00 · 2019-05-08T08:31:34Z

@icemac / @dataflake / @jamadden / @mgedmin ...

Somebody has to make a decision on how to proceed with this issue.

TL/DR

somebody has to decide what really has to be done with files and converters

field2u* is broken for Python 3 and byte input

dataflake · 2019-05-09T12:10:22Z

My opinion:

change the converters to not touch files
leave the field2u* converters alone. We will revisit it when we have gathered more experience with Zope on Python 3.

d-maurer · 2019-05-10T04:56:50Z

Jens Vagelpohl wrote at 2019-5-9 05:10 -0700:

My opinion: - change the converters to not touch files - leave the `field2u*` converters alone. We will revisit it when we have gathered more experience with Zope on Python 3.

Note: for Python 3, the `u`-fields are equivalent to the corresponding fields. This means: any problem we have with the `u`-field converters, we should have as well with the corresponding field converters.

jugmac00 · 2019-05-10T17:44:01Z

Is there any preference on what should happen when a converter gets applied to a file anyway?

I am no big fan of ignoring wrong usage, as I prefer to fail hard and early, ie I'd raise an exception.

d-maurer · 2019-05-10T17:51:00Z

Jürgen Gmach wrote at 2019-5-10 10:44 -0700:

Is there any preference on what should happen when a converter gets applied to a file anyway? I am no big fan of ignoring wrong usage, as I prefer to fail hard and early, ie I'd raise an exception.

+1

dataflake · 2019-05-10T17:59:32Z

You can always use the logger to emit a warning-level log message

jugmac00 · 2021-04-23T08:31:57Z

At the April 2021 sprint we just decided to create a PR and discuss details later.

jugmac00 added bug feedback required help wanted question labels Apr 18, 2019

jugmac00 self-assigned this Apr 19, 2019

jugmac00 mentioned this issue Apr 19, 2019

[Do not merge] Fix converter basics #560

Closed

jugmac00 added this to the 4.0 final milestone Apr 19, 2019

jugmac00 removed their assignment Apr 22, 2019

dataflake modified the milestones: 4.0 final, 4.0.1 May 10, 2019

icemac modified the milestones: 4.0.1, 5.0 May 14, 2019

d-maurer mentioned this issue May 20, 2019

Document ZPublisher request parameter controls #636

Closed

jugmac00 mentioned this issue May 22, 2019

:lines converter converting to list of bytes instead of list of str #638

Closed

d-maurer mentioned this issue May 28, 2019

Modernize request parameter handling #641

Open

d-maurer mentioned this issue Jun 12, 2019

Modernize request parameter handling #641 #648

Closed

icemac modified the milestones: 5.0, 5.1 Oct 2, 2020

icemac modified the milestones: 5.1, 5.2 Apr 20, 2021

jugmac00 self-assigned this Apr 24, 2021

This was referenced Apr 24, 2021

deprecate "unicode" converters (ulines, utext, utoken, ustring) #962

Closed

field2* converters should not handle files #963

Merged

jugmac00 closed this as completed in #963 May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several possible problems within the ZPublisher Converters module #558

Several possible problems within the ZPublisher Converters module #558

jugmac00 commented Apr 18, 2019

leorochael commented Apr 18, 2019

dataflake commented Apr 18, 2019

dataflake commented Apr 18, 2019

jugmac00 commented Apr 19, 2019 •

edited

Loading

jugmac00 commented Apr 19, 2019

d-maurer commented Apr 19, 2019 via email

dataflake commented Apr 19, 2019

dataflake commented Apr 19, 2019

jugmac00 commented Apr 19, 2019

dataflake commented Apr 19, 2019

jugmac00 commented Apr 20, 2019

d-maurer commented Apr 20, 2019 via email

jugmac00 commented Apr 22, 2019

d-maurer commented Apr 22, 2019 via email

leorochael commented Apr 22, 2019

jugmac00 commented May 8, 2019

dataflake commented May 9, 2019

d-maurer commented May 10, 2019 via email

jugmac00 commented May 10, 2019

d-maurer commented May 10, 2019 via email

dataflake commented May 10, 2019

jugmac00 commented Apr 23, 2021

Several possible problems within the ZPublisher Converters module #558

Several possible problems within the ZPublisher Converters module #558

Comments

jugmac00 commented Apr 18, 2019

leorochael commented Apr 18, 2019

dataflake commented Apr 18, 2019

dataflake commented Apr 18, 2019

jugmac00 commented Apr 19, 2019 • edited Loading

jugmac00 commented Apr 19, 2019

d-maurer commented Apr 19, 2019 via email

dataflake commented Apr 19, 2019

dataflake commented Apr 19, 2019

jugmac00 commented Apr 19, 2019

dataflake commented Apr 19, 2019

jugmac00 commented Apr 20, 2019

d-maurer commented Apr 20, 2019 via email

jugmac00 commented Apr 22, 2019

d-maurer commented Apr 22, 2019 via email

leorochael commented Apr 22, 2019

jugmac00 commented May 8, 2019

dataflake commented May 9, 2019

d-maurer commented May 10, 2019 via email

jugmac00 commented May 10, 2019

d-maurer commented May 10, 2019 via email

dataflake commented May 10, 2019

jugmac00 commented Apr 23, 2021

jugmac00 commented Apr 19, 2019 •

edited

Loading