-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type guessing in 0.12 is less robust #182
Comments
It appears that the first problem arises when data is numeric on some rows but free text on other rows. The numeric rows cause |
The "invalid input syntax for type timestamp" error appears to occur because |
- If all types have been rejected, ensure that the fallback flag is correctly set
- replace empty strings with None if they have types that will choke on empty string
- Columns that used numeric on some rows and free text on others resulted in no type being guessed and an error
…#182 - 'timestamp' and 'numeric' cannot handle empty strings, so convert to None
Found another issue, where specific data combinations will result in the parser ignoring double quotes and just splitting rows on every comma. I'm not sure exactly of the exact root cause yet, continuing to investigate. 0.11 has no trouble with this file. |
Ah, found the cause. The input file uses double quotes, but it encloses so many single quotes that the I'm not sure whether it's better to override this and always use double quotes, or set it up to try both. |
One option would be to reduce |
It appears that Messytables was sampling 1000 lines to sniff the CSV dialect, rather than 100, which is why it behaves differently on the attached file. |
- Messytables used to use 1000 rows, the Tabulator approach should do the same
The new type guessing system in 0.12+ encounters errors that the old one didn't. For example,
QOLSVC-1280_current-program-funding-dtis_1_2_2_2_1.csv results in an error due to cells not being recognised as any type, even string:
Or when a field is recognised as numeric or timestamp, and then a later row leaves that field blank, as per
QOLSVC-1280_current-program-funding-dtis_1_2_2_2_2.csv:
The old messytables-based type guessing had no trouble parsing these.
The text was updated successfully, but these errors were encountered: