-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: csv.QUOTE_NOTNULL
doesn't seem to be supported by DataFrame.to_csv()
, but there's no error and it's not documented
#60423
Comments
Thanks, looks like
|
Contributions implementing QUOTE_NOTNULL are welcome! |
@rhshadrach should a separate issue be opened for QUOTE_STRINGS ? |
I think it's okay either way - fine with having both be this issue. It's okay for a PR to only implement one, it just won't fully close this issue. |
@wjandrea it seems QUOTE_NOTNULL (and other QUOTE_*) are already implicitly supported. They are passed through as-is to the csv module's writer. The problem in your case is that there is a
and it worked as expected. In light of this, is there anything we really need to fix here? @rhshadrach @asishm |
Is this tested? The docstring type-hints |
@rhshadrach I think it would make more sense to have def to_csv(
...,
na_rep: str = ''
):
...
_na_rep: str | None = na_rep
if quoting == csv.QUOTE_NOTNULL:
_na_rep = None
formatter = DataFrameFormatter(
...
na_rep=_na_rep,
)
... |
In general, I'm negative on us ignoring values specified by users. Isn't that what your suggested approach is doing? |
@rhshadrach Yes it ignores the
|
I did a quick grep and couldn't find any tests that set |
The pandas API often goes beyond the features of Python, and this can cause conflicts. In this particular case pandas allows specifying the NA representation while I believe Python does not (correct me if this is wrong). In such cases, I do not think it is always best to follow the Python API to the letter. At times, disagreeing with Python behavior is appropriate. Some options:
cc @pandas-dev/pandas-core |
I'd vote for option 2:
|
@rhshadrach Ah, I see what you mean. How about a combination of options 2 and 3 to allow the user to specify
In code: def to_csv(
...,
na_rep: str | None | lib.NoDefault = lib.no_default
):
...
_na_rep: str | None
if quoting == csv.QUOTE_NOTNULL:
if na_rep is lib.no_default or na_rep is None:
_na_rep = None
else:
raise ValueError('For "quoting=csv.QUOTE_NOTNULL", "na_rep" must be "None" or unspecified')
else:
_na_rep = '' if na_rep is lib.no_default else na_rep
formatter = DataFrameFormatter(
...
na_rep=_na_rep,
)
... |
It seems to me this would be raising unnecessarily. Why is that preferred over adhering to the value the user specified? |
@rhshadrach If the function adhered to the value the user specified, what would it do with it? As far as I can figure, if the user specifies |
Why isn't df = pd.DataFrame([[0, None, np.nan, pd.NA]], columns=['a', 'b', 'c', 'd'])
print(df.to_csv(na_rep="x", index=False, quoting=csv.QUOTE_ALL))
# "a","b","c","d"
# "0",x,x,x reasonable? |
That's perfectly reasonable, but how would we implement it? |
Pandas version checks
QUOTE_NOTNULL
Reproducible Example
Issue Description
All fields are quoted, even nulls.
Expected Behavior
All nulls should be unquoted.
Another possibility I might have expected is only unquoting
None
, as described in thecsv
docs, but this doesn't make sense for Pandas IMHO since Pandas has multiple "NULL"s.Installed Versions
The text was updated successfully, but these errors were encountered: