-
Notifications
You must be signed in to change notification settings - Fork 44
Python 2 vs Python 3: DevString with bytes #251
Comments
Hi Matias, Tanks for the report, I feel that you have raised some inconsistencies issues in PyTango Test reportCommand arg_outI run your example with Python3 Server and a Python3 client and it's working well.
Python2 server with python2 client is working fine. I extended the test with a python3 server and python2 client:
The output fit well with the utf-8 encoding:
It seems that the tango documentation is some how not telling the truth ... By looking inside the python boost implementation, I got the following points:
One of issue raised by your example is that PyTango is failing silently: # code from connection.py
def __Connection__command_inout(self, name, *args, **kwds):
r = Connection.command_inout_raw(self, name, *args, **kwds)
if isinstance(r, DeviceData):
try:
return r.extract(self.defaultCommandExtractAs)
except Exception:
return None
else:
return r In fact, a What about command arg_in ?
My server code from tango.server import Device
from tango.server import command
class BytesTest(Device):
@command(dtype_in=str)
def bytes_in1(self, arg_in):
print("Server stdout:: arg_in", arg_in) I tried the same approach with the command argument.
Now let's see between python3 client and python2 server
Python3 client is sending arg_in not in utf-8 format but in latin-1. Now let's try python3 client with python3 client
It's a devfailed, so the error come from the server side.
AttributesNow let see how attributes behave. from tango.server import run
from tango.server import Device
from tango.server import attribute
from tango import AttrWriteType
class BytesTest(Device):
attr = ""
bytes_attr_w = attribute(dtype=str, access=AttrWriteType.READ_WRITE)
def read_bytes_attr_w(self):
print("read, ",repr(self.attr))
return self.attr
def write_bytes_attr_w(self, value):
print("write, ", repr(value))
self.attr = value Python 2.7 server and python3.7 client.
Python 3.7 server and python3.7 client.
Failing on the client side, it means that the exception is on the client side. And python 3.7 server and python 2 client:
Overviewcommand argout:
command argin:
Write attribute
Read attribute:
Conclusion.First of all, command decoding shall not fail silently. There is some quiet big inconsistencies in the way of how PyTango deals with string encoding. It is not only a issue between python2 and python3. Pytango is not always using the default boost encoding/decoding (like in device_data.cpp/insert) and sometimes PyTango force the encoding itself. This PR can be usefull: #180 I guess for python2 compatibility, we shall probably use latin-1 as a standard (as it is defined in the documentation). It means that python3 user will have to be aware about it (No Unicode code_point above \u00ff can be used). |
@AntoineDupre Thanks a lot for the detailed explanation and for the time you took writing all this ! I agree with your conclusion. At the moment this is not a blocking issue for us, since we can use direct connection to the serial |
@AntoineDupre Great investigation and report back - thanks! I started looked at it, but got a bit lost in all the boost/C++ wrapper magic. I agree with you conclusion too. The most urgent change is raising an error if the decoding fails. It would be nice to have automated tests that validate the interaction between different Python versions like this - at least 2.7 and 3.7, but that seems tricky to do with our Travis build... |
@ajoubertza , @AntoineDupre |
@mguijarr, @ajoubertza , @AntoineDupre |
Hi all, First, I don't see how DevSerReadString could work for binary data. I think you should use DevSerReadChar. If you use DevSerReadString to get binary data, if PyTango was working as it should, you risk having exceptions randomly depending if the characters you receive from the serial line fit or not the range 0-127. Now, some facts first to see if I get this right:
My understanding is that:
I always get confused with this so there might be a mistake above. Please check carefully. Any volunteer :-)? |
There is no such thing as an invalid latin-1 bytestring :) >>> bytes(range(256)).decode('latin-1')
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨©ª«¬\xad®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ' However, most unicode code points are not included in latin-1: >>> "€".encode('latin-1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 0: ordinal not in range(256) Sounds good otherwise! |
Hi, I fully agree with Tiago that any client trying to read binary data from the serial device server MUST use the DevSeReadrChar command. Using strings will cause the first null character to terminate the string. |
It's the all-star game @AntoineDupre @ajoubertza @tiagocoutinho @vxgmichel @andygotz 😄 So, I checked more carefully the calls that we make today to the serial device server I was talking
According to the documentation here, the 3 commands return As far as I understand, following @tiagocoutinho proposal, we may be able to replace those calls with:
Indeed, Is it right ? This does not change the fact that there is a bug/weird behaviour with PyTango as demonstrated Thanks for your help guys ! |
You are right of course @vxgmichel! Always mess latin-1 with UTF-7 (don't ask me why :-). @mguijarr, I agree there is a bug in PyTango. Both you and the complete @AntoineDupre analysis make absolute sense. Having only looked very briefly at the C++ server, my opinion is that you should technically be able to use either DevSerReadString(SL_RAW) or DevSerReadChar(SL_RAW). But, as @andygotz pointed, probably it would be more correct to use DevSerReadChar(SL_RAW). |
I have assigned the bug to me. Here are my findings so far:
In all tests I used the equivalent of a python 3 One thing I noticed also is that you cannot put a '\x00'. It will cut the DevString. Need to check if it is on PyTango or C++ side. Need to check what is happening with TANGO properties as well |
Hi all,
I noticed an intriguing behavior of PyTango running within a Python 3 environment.
Scenario
An old, existing C++ server to control a RS232 serial line. The server has a
DevSerReadString
method to read chars from the serial line ; Tango output data type is
DevString
.The equipment connected to the serial line sends binary data.
Problem
When calling the command from a Python 2 client, everything works fine -- the returned string corresponds to the bytes sent by the equipment.
When calling the command from a Python 3 client,
None
is returned.Minimal example to reproduce the problem
Server in Python 2
Python 3 client
Additional information
Of course the problem comes from bytes/unicode handling in Python. It is also probably a
bad idea to use
DevString
for this kind of data transfer, maybeDevVarCharArray
wouldhave been better...
According to the PyTango documentation,
DevString
is encoded as latin-1, but it does not seem to be the case in reality.The text was updated successfully, but these errors were encountered: