-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/API: converting invalid column names in to_sql #6796
Comments
@jorisvandenbossche I would go with an option, may |
I can pick this one up if @jorisvandenbossche has not already begun work on it. |
@danielballan certainly do! (very welcome as I have still other work that I would like to do for the sql code) I have not yet begun to work on it, the problems just lies in the fact that the insert statement uses the frame's column names, and not the converted ones (https://github.com/pydata/pandas/blob/62b4d6405c21cfd74b4e9e00c298648d6b2e2e82/pandas/io/sql.py#L442) as in the table statement (https://github.com/pydata/pandas/blob/62b4d6405c21cfd74b4e9e00c298648d6b2e2e82/pandas/io/sql.py#L502)) I would certainly default to do nothing, and maybe provide a keyword argument to do something else (remove space, to lower case). But the first is the priority I think. But maybe it would be better that first #6735 gets merged. If you can take a look at it, please do. |
@jorisvandenbossche @danielballan I think the whole safe columns thing is unnecessary if we trust SQLAlchemy to escape names correctly (which I think we can). In that case we don't even need an option for translate names (we're accumulating too many options as it is, as they say options is where good design goes to die). |
Good. I was coming to the same conclusion. On Tuesday, April 15, 2014, mangecoeur notifications@github.com wrote:
|
@mangecoeur Fully agree, that was also in line with what I proposed. I would let it fully up to the user to provide the column names it wants. |
See here: #6883 (comment), I was completely mistaken about the |
If we commit to one backward-incompatible change in |
It's not completely "for free" as it are two different changes (with possible different side effects for existing code), but I agree if we want to change it, we do it better now and together. Or, we could also trigger a warning for legacy mode now that it will change it next version (but maybe that is not needed?) |
An idea: could we do a quick check (when in legacy mode) if there are columns names with spaces? (the names which would have been converted previously) And then raise a warning, to warn the user that the behaviour changed and that the column names will not be converted anymore (and maybe hint how to do it themselves). Or would this just be annoying? I think this could help prevent surprises, but of course, once you've seen it and decide to keep using column names with spaces this can become annoying. |
a warning sounds good don't need to try too hard for backwards compat - it's an API change - users need to adjust |
At the moment, with the new sql code, there is a bug in the conversion of invalid column names (trough the function
_safe_col_name
). Eg:I know the reason, and it is easily fixed (the table statement uses the adapted column, but the inserting still the old one, so the data are not inserted). But I was wondering if this is actually necessary?
Because, databases can handle column names with spaces (at least sqlite, mysql and postgresql that I know a little bit), the names are then just quoted and sqlalchemy ensures this is done properly for us.
Often, it is advised to not use spaces (or capital letters) in column names, as it is just tedious to always have to quote them and it is more robust to not have to do that. But it does work, so I would think it is up to the user to provide the column names he/she wants.
So proposal:
@mangecoeur @hayd
PS: the fact there are
None
s in the example above and notNaN
s is still another issue I think.The text was updated successfully, but these errors were encountered: