-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column name decoding issue after v4.0.3 #194
Comments
What I found out after 4.0.3 is that different databases return the column names differently. SQL Server, PostgreSQL, and MySQL all return them as UTF-16LE regardless of the data encoding. I've added a 3rd decoding option for this but defaulted it to utf-16le. Please add this:
Sorry for the trouble, but each database does things different instead of following the spec. (In this case the spec says it should be UTF16LE.). I'm considering creating a "factory" that can be configured and then connections created using factory.connect(). I'd also consider adding something like:
with setting people have tested. (I don't have teradata.) In the meantime I'll try to keep the docs up to date. Please let me know if this works for you. |
So here is my updated configuration block, connection.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
connection.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
connection.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-8')
connection.setencoding(encoding='utf-8') And here is the new output from
So the extra decoding setting did have an effect, but it's still not right. |
Not sure if this is helpful at all: http://info.teradata.com/HTMLPubs/DB_TTU_15_10/index.html#page/Connectivity/B035_2509_035K/2509ch08.10.04.html I don't fully understand it, but the section "When Making SQLColAttribute Calls" feels like it might be relevant. |
Dang - the fact that drivers don't follow specs means I have to support every possible combination of buffers everywhere! I'm calling SQLDescribeColumnW which only accepts a SQLWCHAR buffer. Can you try one more thing? It shouldn't be necessary so I'm not expecting a change but I'd like to get it tested:
Please note that this could cause a crash if it is what I think - it might look right but actually use bytes beyond the those in the buffer. I'll try to find a teradata database I can test with soon. Do you know of any public ways to do so? I thought the only test version they offered was a full VM for something I don't use (I use VirtualBox). |
The extra ctype keyword didn't have any effect on the description output. No crash, either. It looks like there's some documentation on character sets, but again, unfortunately I don't really understand how all the pieces fit together: http://info.teradata.com/HTMLPubs/DB_TTU_15_10/index.html#page/Connectivity/B035_2509_035K/2509ch08.10.32.html |
Oh, and I use my company's instance of Teradata and I'm not aware of a public version. However, I'm more than happy to test whatever you need! |
Is it worth looking into how the |
I'm going to have to find a way to test it so I can dump out buffers raw. In the meantime you can try these and see if they help:
The first is what it looks like should be correct, but is also the default. Maybe there is a bug and the default is getting overwritten? |
The first gives me the same output as in my original post (as expected, I think). The second gives me asian characters of some kind :) I'll keep playing around with different encoding/decoding options on my own, including the CharacterSet connection option for the Teradata driver. |
OK - in the meantime I'm downloading Teradata Express so I can test in a VM. (I forgot you can run a VMWare vmdk in VirtualBox.). However, I'm wondering about the driver - it looks like it uses iODBC. Are you using a standard binary wheel of pyodbc or building your own with iODBC? |
I built pyodbc myself against my driver manager, which is iODBC from Homebrew. Should I try building against unixodbc instead? |
There might be a SQLWCHAR size difference then. You might try "utf32le" and "utf32be". It'll take me a bit longer to setup with iODBC. Working on it. |
TL;DR - Use 4.0.7 and set the decoding to:
Whew ... it took forever to get to the point where I can test. I don't know if this is buggy behavior in the Teradata driver (my guess) or if iODBC returns 4-byte SQLCHAR for metadata. Either way I've added a hack that if the metadata is set to one of the UTF-32 encodings it assumes sizeof(SQLWCHAR) is 4. In case you are wondering how all of this started and why we can't use what seemed to work before, I switched to the Unicode version of SQLDescribeCol to handle Unicode column names. That's not something I wanted to / could revert. |
Thanks for fixing this, it works perfectly now! The amount of work you put into this project is heroic. |
Hi, |
Just adding another data point- I saw the same issue connecting to Vertica, so at the least it's not unique to Teradata. (py3.6, macOS, pyodbc 4.0.22) As described above, setting the metadata decoding to utf-32le worked for me:
|
conn_vert.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-32le') was a winner for me also.... |
Everyone, comment out, or delete, the following line of code line 80 in hyper_file_generator.py in the library (if you're creating a hyper):
This immediately fixed the issue for me. |
I'm connecting to a Teradata database from Python 3.6 on macOS. I'm seeing strange column names when I use v4.0.5 and v4.0.6 of pyodbc that I wasn't seeing before with v4.0.3. It almost looks like a decoding issue.
In v4.0.3, I can do this:
Which results in,
However, if I repeat the same exact process in v4.0.5 and v.4.0.6, I get this:
Have I set up my connection incorrectly, or do my findings point to a breaking change after v4.0.3?
Thanks.
The text was updated successfully, but these errors were encountered: