allow RAW encoding #385

Lexcon · 2020-01-02T09:08:46Z

cx_Oracle could use an encoding 'raw' which would lead to returning bytes instead of unicode strings without any conversion. That way, conversion and fixing of corrupt strings can be done on Python level instead of cx_Oracle level.

Also, legacy database content with mixed encodings can be supported then. It would work like the utl_raw.cast_to_raw function but without the length limitation of 4000 bytes. In fact, it would work like in python 2.7 now.

For testing, there also should be a way to write data. Eg this table could be supported:

create table translations (encoding varchar2(20), content varchar2(1000))
insert into translations (encoding, content) values ('utf-8', 'abë'.encode('utf-8'))
insert into translations (encoding, content) values ('windows-1252', 'abë'.encode('windows-1252'))

Additional advantage is that legacy 2.7 Python code now might have already encoding and decoding in place. In case Py27 code still runs, it would make moving this to 3.8 easier because no changes on the Python level are needed then.

This change of course would only apply to the py3 version of cx_Oracle since in Py2 this is already how it worked.

anthony-tuininga · 2020-01-02T23:45:03Z

An interesting concept. And I understand the purpose behind it. The question is what sort of interface would make sense. Any thoughts on that?

Lexcon · 2020-01-03T09:38:38Z

I think there are still a lot of patch-up python scripts out there that look like the below. If 'raw' is just recognized as a dummy charset, the below py27 python code may be run in py3x without if/else blocks to distinquish between python versions. The dummy 'raw' encoding is used to just bypasses the cx_Oracles internal decode() function. I suspect that this is the most easy way to implement it.

cursor.execute('select somestring, somencoding from sometable')
for row in cursor:
try:
result = row[0].decode('windows-1252') # or based on the somencoding field
except:
result = row[0].decode('utf-8')

Alternatively, a sort of 'outputtypehandle' could be implemented where the user can feed a converter function into the query or settings but on a lower level than is currently possible. That is more 'elegant' in the sense that cx_Oracle would still produce only unicode output, but it would give control to the programmer. This would break the above python code though. Side requirement would be that this converter function needs to receive the entire row object somehow, not just the field in question, because in the example above it would need to read the 'someencoding' field which in itself might be character based.

Given the above I would opt for the more 'quick and dirty' solution in favor of the 'elegant' solution. I know pyODBC has this mechanism, I use it and it works well like that.

Signed-off-by: Darko Djolovic <ddjolovic@outlook.com>

* Implemented #385 enhancement and updated documentation Signed-off-by: Darko Djolovic <ddjolovic@outlook.com> * Created flag to Cursor.var() Signed-off-by: Darko Djolovic <ddjolovic@outlook.com> * Removed first commit changes, updated documetnation Signed-off-by: Darko Djolovic <ddjolovic@outlook.com> * Added testing sample 'QueringRawData.py' and renamed attribute 'bypassstringencoding' to 'bypassencoding' with updated documentation Signed-off-by: Darko Djolovic <ddjolovic@outlook.com>

consistent and to comply with PEP 8 naming guidelines; also adjust implementation of #385 (originally done in pull request #549) to use the parameter name `bypass_decode` instead of `bypassencoding`.

anthony-tuininga · 2021-04-23T22:16:46Z

Take a look at the implementation which is demonstrated in the new sample. This should address this enhancement but let me know if you agree! Feedback is always appreciated!

anthony-tuininga · 2021-05-19T02:29:12Z

cx_Oracle 8.2 has just been released which includes this enhancement.

Lexcon added the enhancement label Jan 2, 2020

anthony-tuininga mentioned this issue May 19, 2020

Use UTF-8 as default encoding instead of ASCII #409

Closed

anthony-tuininga mentioned this issue Oct 8, 2020

Getting hexa code values instead of characters for Unicode code characters. #483

Closed

Draco94 added a commit to Draco94/python-cx_Oracle that referenced this issue Mar 25, 2021

Implemented oracle#385 enhancement and updated documentation

00fc04d

Signed-off-by: Darko Djolovic <ddjolovic@outlook.com>

anthony-tuininga added the patch available Awaiting inclusion in official release label Apr 23, 2021

anthony-tuininga closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow RAW encoding #385

allow RAW encoding #385

Lexcon commented Jan 2, 2020

anthony-tuininga commented Jan 2, 2020

Lexcon commented Jan 3, 2020 •

edited

Loading

anthony-tuininga commented Apr 23, 2021

anthony-tuininga commented May 19, 2021

allow RAW encoding #385

allow RAW encoding #385

Comments

Lexcon commented Jan 2, 2020

anthony-tuininga commented Jan 2, 2020

Lexcon commented Jan 3, 2020 • edited Loading

anthony-tuininga commented Apr 23, 2021

anthony-tuininga commented May 19, 2021

Lexcon commented Jan 3, 2020 •

edited

Loading