Column name decoding issue after v4.0.3 #194

jesaerys · 2017-02-15T19:10:25Z

I'm connecting to a Teradata database from Python 3.6 on macOS. I'm seeing strange column names when I use v4.0.5 and v4.0.6 of pyodbc that I wasn't seeing before with v4.0.3. It almost looks like a decoding issue.

In v4.0.3, I can do this:

connection = pyodbc.connect(dsn='mydsn')

# Encoding/decoding for Teradata+Python3.6 as directed in the docs
connection.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
connection.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
connection.setencoding(encoding='utf-8')

cursor = connection.cursor()
cursor.execute("SELECT TOP 5 * FROM mydb.mytable")
cursor.description

Which results in,

(('STATE', str, None, 2, 2, 0, True),
 ('ZIP', str, None, 5, 5, 0, True),
 ('CITY', str, None, 30, 30, 0, True),
 ('ADDRESS', str, None, 30, 30, 0, True))

However, if I repeat the same exact process in v4.0.5 and v.4.0.6, I get this:

(('S\x00T\x00A', str, None, 2, 2, 0, True),
 ('Z\x00I', str, None, 5, 5, 0, True),
 ('C\x00I\x00', str, None, 30, 30, 0, True),
 ('A\x00D\x00D\x00R', str, None, 30, 30, 0, True))

Have I set up my connection incorrectly, or do my findings point to a breaking change after v4.0.3?

Thanks.

The text was updated successfully, but these errors were encountered:

mkleehammer · 2017-02-15T19:16:53Z

What I found out after 4.0.3 is that different databases return the column names differently. SQL Server, PostgreSQL, and MySQL all return them as UTF-16LE regardless of the data encoding.

I've added a 3rd decoding option for this but defaulted it to utf-16le. Please add this:

connection.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-8')

Sorry for the trouble, but each database does things different instead of following the spec. (In this case the spec says it should be UTF16LE.).

I'm considering creating a "factory" that can be configured and then connections created using factory.connect(). I'd also consider adding something like:

factory.autoconfigure('teradata')

with setting people have tested. (I don't have teradata.)

In the meantime I'll try to keep the docs up to date. Please let me know if this works for you.

jesaerys · 2017-02-15T19:24:01Z

So here is my updated configuration block,

connection.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
connection.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
connection.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-8')
connection.setencoding(encoding='utf-8')

And here is the new output from cursor.description in my example:

(('S\x00\x00\x00T\x00\x00\x00A\x00', str, None, 2, 2, 0, True),
 ('Z\x00\x00\x00I\x00', str, None, 5, 5, 0, True),
 ('C\x00\x00\x00I\x00\x00\x00', str, None, 30, 30, 0, True),
 ('A\x00\x00\x00D\x00\x00\x00D\x00\x00\x00R\x00', str, None, 30, 30, 0, True))

So the extra decoding setting did have an effect, but it's still not right.
Any ideas? If I knew more about encoding/decoding, I might be able to help troubleshoot this better, sorry.

jesaerys · 2017-02-15T19:48:30Z

Not sure if this is helpful at all: http://info.teradata.com/HTMLPubs/DB_TTU_15_10/index.html#page/Connectivity/B035_2509_035K/2509ch08.10.04.html

I don't fully understand it, but the section "When Making SQLColAttribute Calls" feels like it might be relevant.

mkleehammer · 2017-02-15T19:58:17Z

Dang - the fact that drivers don't follow specs means I have to support every possible combination of buffers everywhere! I'm calling SQLDescribeColumnW which only accepts a SQLWCHAR buffer. Can you try one more thing? It shouldn't be necessary so I'm not expecting a change but I'd like to get it tested:

connection.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-8', ctype=pyodbc.SQL_CHAR)

Please note that this could cause a crash if it is what I think - it might look right but actually use bytes beyond the those in the buffer.

I'll try to find a teradata database I can test with soon. Do you know of any public ways to do so? I thought the only test version they offered was a full VM for something I don't use (I use VirtualBox).

jesaerys · 2017-02-15T20:03:25Z

The extra ctype keyword didn't have any effect on the description output. No crash, either.

It looks like there's some documentation on character sets, but again, unfortunately I don't really understand how all the pieces fit together: http://info.teradata.com/HTMLPubs/DB_TTU_15_10/index.html#page/Connectivity/B035_2509_035K/2509ch08.10.32.html

jesaerys · 2017-02-15T20:04:39Z

Oh, and I use my company's instance of Teradata and I'm not aware of a public version. However, I'm more than happy to test whatever you need!

jesaerys · 2017-02-15T20:07:03Z

Is it worth looking into how the teradata package does things?

mkleehammer · 2017-02-15T20:39:44Z

I'm going to have to find a way to test it so I can dump out buffers raw. In the meantime you can try these and see if they help:

connection.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-16le')
connection.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-16be')

The first is what it looks like should be correct, but is also the default. Maybe there is a bug and the default is getting overwritten?

jesaerys · 2017-02-15T22:18:05Z

The first gives me the same output as in my original post (as expected, I think). The second gives me asian characters of some kind :)

I'll keep playing around with different encoding/decoding options on my own, including the CharacterSet connection option for the Teradata driver.

mkleehammer · 2017-02-15T22:37:35Z

OK - in the meantime I'm downloading Teradata Express so I can test in a VM. (I forgot you can run a VMWare vmdk in VirtualBox.). However, I'm wondering about the driver - it looks like it uses iODBC. Are you using a standard binary wheel of pyodbc or building your own with iODBC?

jesaerys · 2017-02-15T22:43:34Z

I built pyodbc myself against my driver manager, which is iODBC from Homebrew.

Should I try building against unixodbc instead?

mkleehammer · 2017-02-15T22:47:40Z

There might be a SQLWCHAR size difference then. You might try "utf32le" and "utf32be". It'll take me a bit longer to setup with iODBC. Working on it.

mkleehammer · 2017-02-16T21:58:32Z

TL;DR - Use 4.0.7 and set the decoding to:

cnxn.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-32le')

Whew ... it took forever to get to the point where I can test. I don't know if this is buggy behavior in the Teradata driver (my guess) or if iODBC returns 4-byte SQLCHAR for metadata. Either way I've added a hack that if the metadata is set to one of the UTF-32 encodings it assumes sizeof(SQLWCHAR) is 4.

In case you are wondering how all of this started and why we can't use what seemed to work before, I switched to the Unicode version of SQLDescribeCol to handle Unicode column names. That's not something I wanted to / could revert.

jesaerys · 2017-02-26T21:24:33Z

Thanks for fixing this, it works perfectly now! The amount of work you put into this project is heroic.

CoderSong2015 · 2017-08-09T10:45:55Z

Hi,
I met a problem when I connected to a database(product of my company) with pyodbc4.0.17 and unixODBC2.3.4 that the column name can't be decoded.Then I found that the unixODBC selects whether to call an API with W according to the application type(ANSI or Unicode) but pyodbc calls the SQLDescribeColumnW by default instead of according to the application type so that our driver can not handle inconsistent coding requirements before and after.I want to know that whether the design will result in only support for unicode? @mkleehammer

kurtosis · 2018-03-16T01:22:48Z

Just adding another data point- I saw the same issue connecting to Vertica, so at the least it's not unique to Teradata. (py3.6, macOS, pyodbc 4.0.22)

As described above, setting the metadata decoding to utf-32le worked for me:

conn_vert.setdecoding(pyodbc.SQL_WCHAR,` encoding='utf-8')
conn_vert.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn_vert.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-32le')
conn_vert.setencoding(encoding='utf-8')

freshwuzhere · 2018-07-17T11:37:17Z

conn_vert.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-32le')

was a winner for me also....

rsorma04 · 2019-10-12T16:49:01Z

Everyone, comment out, or delete, the following line of code line 80 in hyper_file_generator.py in the library (if you're creating a hyper):

print((col[0].decode('utf-8')))

This immediately fixed the issue for me.

mkleehammer added 2 - Working Investigating labels Feb 15, 2017

mkleehammer closed this as completed in 297bfd9 Feb 16, 2017

mkleehammer removed 2 - Working Investigating labels Mar 12, 2017

guardhunt mentioned this issue Jan 18, 2019

Column name decoding issue in DB2 with unixodbc #512

Closed

This was referenced May 30, 2019

Encoding of column names not handled when using cursors directly reecetech/django_informixdb#15

Open

Encoding issue in handling error messages from Informix (and others?) #570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column name decoding issue after v4.0.3 #194

Column name decoding issue after v4.0.3 #194

jesaerys commented Feb 15, 2017

mkleehammer commented Feb 15, 2017

jesaerys commented Feb 15, 2017

jesaerys commented Feb 15, 2017

mkleehammer commented Feb 15, 2017

jesaerys commented Feb 15, 2017

jesaerys commented Feb 15, 2017 •

edited

Loading

jesaerys commented Feb 15, 2017

mkleehammer commented Feb 15, 2017

jesaerys commented Feb 15, 2017

mkleehammer commented Feb 15, 2017

jesaerys commented Feb 15, 2017 •

edited

Loading

mkleehammer commented Feb 15, 2017

mkleehammer commented Feb 16, 2017

jesaerys commented Feb 26, 2017

CoderSong2015 commented Aug 9, 2017

kurtosis commented Mar 16, 2018

freshwuzhere commented Jul 17, 2018

rsorma04 commented Oct 12, 2019

Column name decoding issue after v4.0.3 #194

Column name decoding issue after v4.0.3 #194

Comments

jesaerys commented Feb 15, 2017

mkleehammer commented Feb 15, 2017

jesaerys commented Feb 15, 2017

jesaerys commented Feb 15, 2017

mkleehammer commented Feb 15, 2017

jesaerys commented Feb 15, 2017

jesaerys commented Feb 15, 2017 • edited Loading

jesaerys commented Feb 15, 2017

mkleehammer commented Feb 15, 2017

jesaerys commented Feb 15, 2017

mkleehammer commented Feb 15, 2017

jesaerys commented Feb 15, 2017 • edited Loading

mkleehammer commented Feb 15, 2017

mkleehammer commented Feb 16, 2017

jesaerys commented Feb 26, 2017

CoderSong2015 commented Aug 9, 2017

kurtosis commented Mar 16, 2018

freshwuzhere commented Jul 17, 2018

rsorma04 commented Oct 12, 2019

jesaerys commented Feb 15, 2017 •

edited

Loading

jesaerys commented Feb 15, 2017 •

edited

Loading