BUG: `csv.QUOTE_NOTNULL` doesn't seem to be supported by `DataFrame.to_csv()`, but there's no error and it's not documented #60423

wjandrea · 2024-11-26T14:59:55Z

Pandas version checks

I have checked that this issue has not already been reported.
- No issues mention QUOTE_NOTNULL
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import csv
import pandas as pd

df = pd.DataFrame(
    [[0, None, np.nan, pd.NA]],
    columns=['a', 'b', 'c', 'd'])

print(df.to_csv(quoting=csv.QUOTE_NOTNULL, index=False))

Issue Description

All fields are quoted, even nulls.

"a","b","c","d"
"0","","",""

Expected Behavior

All nulls should be unquoted.

"a","b","c","d"
"0",,,

Another possibility I might have expected is only unquoting None, as described in the csv docs, but this doesn't make sense for Pandas IMHO since Pandas has multiple "NULL"s.

"a","b","c","d"
"0",,"",""

Installed Versions

INSTALLED VERSIONS
------------------
commit                : b1c2ba793bb1f3d7001de41ca4c73377006b9e4d
python                : 3.12.7
python-bits           : 64
OS                    : Linux
OS-release            : 5.4.0-200-generic
Version               : #220-Ubuntu SMP Fri Sep 27 13:19:16 UTC 2024
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : fr_CA.UTF-8
LOCALE                : fr_CA.UTF-8

pandas                : 3.0.0.dev0+1729.gb1c2ba793b
numpy                 : 2.1.0.dev0+git20240403.e59c074
dateutil              : 2.9.0.post0
pip                   : 24.2
Cython                : None
sphinx                : None
IPython               : 8.22.2
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
blosc                 : None
bottleneck            : None
fastparquet           : None
fsspec                : None
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : None
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
psycopg2              : None
pymysql               : None
pyarrow               : None
pyreadstat            : None
pytest                : None
python-calamine       : None
pytz                  : 2024.1
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None

The text was updated successfully, but these errors were encountered:

asishm · 2024-11-26T16:24:19Z

Thanks, looks like csv.QUOTE_NOTNULL was added in python 3.12 along with csv.QUOTE_STRINGS. The user guide at https://pandas.pydata.org/docs/user_guide/io.html#quoting-compression-and-file-format also only mentions 0,1,2,3

quotingint or csv.QUOTE_* instance, default 0
Control field quoting behavior per csv.QUOTE_* constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).

rhshadrach · 2024-12-26T22:18:17Z

Contributions implementing QUOTE_NOTNULL are welcome!

asishm · 2024-12-27T02:23:57Z

@rhshadrach should a separate issue be opened for QUOTE_STRINGS ?

rhshadrach · 2024-12-27T13:34:20Z

I think it's okay either way - fine with having both be this issue. It's okay for a PR to only implement one, it just won't fully close this issue.

snitish · 2025-01-11T05:19:20Z

@wjandrea it seems QUOTE_NOTNULL (and other QUOTE_*) are already implicitly supported. They are passed through as-is to the csv module's writer. The problem in your case is that there is a na_rep argument in to_csv that defaults to '' (empty string), so the NAs are being replaced with empty strings and QUOTE_NOTNULL is not taking effect. I explicitly set na_rep to None -

>>> print(df.to_csv(quoting=csv.QUOTE_NOTNULL, index=False, na_rep=None))
"a","b","c","d"
"0",,,

and it worked as expected. In light of this, is there anything we really need to fix here? @rhshadrach @asishm

rhshadrach · 2025-01-11T11:40:38Z

I explicitly set na_rep to None -

Is this tested? The docstring type-hints na_rep as a str, so if we are supporting None it should be updated. This seems quite useful, so I'm supportive of allowing None.

wjandrea · 2025-01-11T17:44:59Z

@rhshadrach I think it would make more sense to have to_csv() ignore the na_rep parameter if quoting == csv.QUOTE_NOTNULL and instead use None internally, something like this:

def to_csv(
    ...,
    na_rep: str = ''
):
    ...
    _na_rep: str | None = na_rep
    if quoting == csv.QUOTE_NOTNULL:
        _na_rep = None

    formatter = DataFrameFormatter(
        ...
        na_rep=_na_rep,
    )
    ...

rhshadrach · 2025-01-11T17:52:23Z

In general, I'm negative on us ignoring values specified by users. Isn't that what your suggested approach is doing?

wjandrea · 2025-01-11T20:28:38Z

@rhshadrach Yes it ignores the na_rep argument, but if the user specifies csv.QUOTE_NOTNULL, what else would they expect? I mean, the csv docs say:

csv.QUOTE_NOTNULL ... quote all fields which are not None. ... if a field value is None an empty (unquoted) string is written.

snitish · 2025-01-11T21:03:11Z

I explicitly set na_rep to None -

Is this tested?

I did a quick grep and couldn't find any tests that set na_rep=None. Agree that supporting (officially) na_rep=None might be cleaner than ignoring na_rep when the user specifies csv.QUOTE_NOTNULL.

rhshadrach · 2025-01-12T22:20:24Z

@rhshadrach Yes it ignores the na_rep argument, but if the user specifies csv.QUOTE_NOTNULL, what else would they expect? I mean, the csv docs say:

The pandas API often goes beyond the features of Python, and this can cause conflicts. In this particular case pandas allows specifying the NA representation while I believe Python does not (correct me if this is wrong). In such cases, I do not think it is always best to follow the Python API to the letter. At times, disagreeing with Python behavior is appropriate.

Some options:

Ignore na_rep when csv.QUOTE_NOTNULL is specified.
Raise when na_rep is not None when csv.QUOTE_NOTNULL is specified.
Change the default of na_rep to lib.no_default, making the real default of na_rep dependent on other arguments (None when csv.QUOTE_NOTNULL is specified, "" otherwise).
Keep the current default and behavior of na_rep="", making a user specify na_rep=None and csv.QUOTE_NOTNULL to get the same behavior as Python's CSV module.

cc @pandas-dev/pandas-core

Dr-Irv · 2025-01-13T15:54:22Z

I'd vote for option 2:

Raise when na_rep is not None when csv.QUOTE_NOTNULL is specified.

wjandrea · 2025-01-13T18:23:08Z

@rhshadrach Ah, I see what you mean.

How about a combination of options 2 and 3 to allow the user to specify None explicitly or implicitly? i.e. if the user specifies csv.QUOTE_NOTNULL, then for na_rep:

unspecified: use None
None: continue as normal
any other value: raise

In code:

def to_csv(
    ...,
    na_rep: str | None | lib.NoDefault = lib.no_default
):
    ...
    _na_rep: str | None
    if quoting == csv.QUOTE_NOTNULL:
        if na_rep is lib.no_default or na_rep is None:
            _na_rep = None
        else:
            raise ValueError('For "quoting=csv.QUOTE_NOTNULL", "na_rep" must be "None" or unspecified')
    else:
        _na_rep = '' if na_rep is lib.no_default else na_rep

    formatter = DataFrameFormatter(
        ...
        na_rep=_na_rep,
    )
    ...

rhshadrach · 2025-01-20T22:04:22Z

How about a combination of options 2 and 3...:

...

any other value: raise

It seems to me this would be raising unnecessarily. Why is that preferred over adhering to the value the user specified?

wjandrea · 2025-01-21T17:12:50Z

@rhshadrach If the function adhered to the value the user specified, what would it do with it? As far as I can figure, if the user specifies csv.QUOTE_NOTNULL, the only reasonable value for na_rep is None.

rhshadrach · 2025-01-22T01:26:12Z

Why isn't

df = pd.DataFrame([[0, None, np.nan, pd.NA]], columns=['a', 'b', 'c', 'd'])
print(df.to_csv(na_rep="x", index=False, quoting=csv.QUOTE_ALL))
# "a","b","c","d"
# "0",x,x,x

reasonable?

wjandrea · 2025-01-22T02:20:41Z

That's perfectly reasonable, but how would we implement it? .to_csv() ultimately uses the csv module, which doesn't support that.

wjandrea added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 26, 2024

asishm added the IO CSV read_csv, to_csv label Nov 26, 2024

rhshadrach added Enhancement and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 26, 2024

rhshadrach added the Needs Discussion Requires discussion from core team before further action label Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `csv.QUOTE_NOTNULL` doesn't seem to be supported by `DataFrame.to_csv()`, but there's no error and it's not documented #60423

BUG: `csv.QUOTE_NOTNULL` doesn't seem to be supported by `DataFrame.to_csv()`, but there's no error and it's not documented #60423

wjandrea commented Nov 26, 2024 •

edited

Loading

asishm commented Nov 26, 2024

rhshadrach commented Dec 26, 2024

asishm commented Dec 27, 2024

rhshadrach commented Dec 27, 2024

snitish commented Jan 11, 2025 •

edited

Loading

rhshadrach commented Jan 11, 2025

wjandrea commented Jan 11, 2025 •

edited

Loading

rhshadrach commented Jan 11, 2025

wjandrea commented Jan 11, 2025

snitish commented Jan 11, 2025 •

edited

Loading

rhshadrach commented Jan 12, 2025

Dr-Irv commented Jan 13, 2025

wjandrea commented Jan 13, 2025 •

edited

Loading

rhshadrach commented Jan 20, 2025

wjandrea commented Jan 21, 2025

rhshadrach commented Jan 22, 2025

wjandrea commented Jan 22, 2025 •

edited

Loading

BUG: csv.QUOTE_NOTNULL doesn't seem to be supported by DataFrame.to_csv(), but there's no error and it's not documented #60423

BUG: csv.QUOTE_NOTNULL doesn't seem to be supported by DataFrame.to_csv(), but there's no error and it's not documented #60423

Comments

wjandrea commented Nov 26, 2024 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

asishm commented Nov 26, 2024

rhshadrach commented Dec 26, 2024

asishm commented Dec 27, 2024

rhshadrach commented Dec 27, 2024

snitish commented Jan 11, 2025 • edited Loading

rhshadrach commented Jan 11, 2025

wjandrea commented Jan 11, 2025 • edited Loading

rhshadrach commented Jan 11, 2025

wjandrea commented Jan 11, 2025

snitish commented Jan 11, 2025 • edited Loading

rhshadrach commented Jan 12, 2025

Dr-Irv commented Jan 13, 2025

wjandrea commented Jan 13, 2025 • edited Loading

rhshadrach commented Jan 20, 2025

wjandrea commented Jan 21, 2025

rhshadrach commented Jan 22, 2025

wjandrea commented Jan 22, 2025 • edited Loading

BUG: `csv.QUOTE_NOTNULL` doesn't seem to be supported by `DataFrame.to_csv()`, but there's no error and it's not documented #60423

BUG: `csv.QUOTE_NOTNULL` doesn't seem to be supported by `DataFrame.to_csv()`, but there's no error and it's not documented #60423

wjandrea commented Nov 26, 2024 •

edited

Loading

snitish commented Jan 11, 2025 •

edited

Loading

wjandrea commented Jan 11, 2025 •

edited

Loading

snitish commented Jan 11, 2025 •

edited

Loading

wjandrea commented Jan 13, 2025 •

edited

Loading

wjandrea commented Jan 22, 2025 •

edited

Loading