Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR, DOC: Deprecate buffer_lines in read_csv #13360

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,12 @@ low_memory : boolean, default ``True``
Note that the entire file is read into a single DataFrame regardless,
use the ``chunksize`` or ``iterator`` parameter to return the data in chunks.
(Only valid with C parser)
buffer_lines : int, default None
DEPRECATED: this argument will be removed in a future version because its
value is not respected by the parser

If ``low_memory`` is ``True``, specify the number of rows to be read for
each chunk. (Only valid with C parser)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we leave out the actual explanation? As this is not only deprecated, but also does not work (IIUC), so it does not serve much purpose IMO (apart from explaining what feature exactly has never worked, and will never work ..)

BTW, @gfyoung, for the rest strong +1 on your cleaning up of the keywords!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche : Will do. Thanks, for the +1 - I use this function a great deal in my own code, so I'm certainly more than happy to improve it given how much it has done for me, even in such a "broken" state. 😄

compact_ints : boolean, default False
DEPRECATED: this argument will be removed in a future version

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.18.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,7 @@ Deprecations
^^^^^^^^^^^^

- ``compact_ints`` and ``use_unsigned`` have been deprecated in ``pd.read_csv`` and will be removed in a future version (:issue:`13320`)
- ``buffer_lines`` has been deprecated in ``pd.read_csv`` and will be removed in a future version (:issue:`13360`)

.. _whatsnew_0182.performance:

Expand Down
11 changes: 9 additions & 2 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,14 +227,19 @@
Note that the entire file is read into a single DataFrame regardless,
use the `chunksize` or `iterator` parameter to return the data in chunks.
(Only valid with C parser)
buffer_lines : int, default None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this even used at all now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems its not respected at all now. so we should just remove this argument (or raise if its not None)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my PR description, nowhere. How significant of an API change would either option be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it doesn't do anything now. I guess deprecation is fine. Why don't you re-word to say it currently has no-effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Done.

DEPRECATED: this argument will be removed in a future version because its
value is not respected by the parser

If low_memory is True, specify the number of rows to be read for each
chunk. (Only valid with C parser)
compact_ints : boolean, default False
DEPRECATED: this argument will be removed in a future version

If compact_ints is True, then for any column that is of integer dtype,
the parser will attempt to cast it as the smallest integer dtype possible,
either signed or unsigned depending on the specification from the
`use_unsigned` parameter.

use_unsigned : boolean, default False
DEPRECATED: this argument will be removed in a future version

Expand Down Expand Up @@ -448,6 +453,7 @@ def _read(filepath_or_buffer, kwds):
'float_precision',
])
_deprecated_args = set([
'buffer_lines',
'compact_ints',
'use_unsigned',
])
Expand Down Expand Up @@ -806,7 +812,8 @@ def _clean_options(self, options, engine):
_validate_header_arg(options['header'])

for arg in _deprecated_args:
if result[arg] != _c_parser_defaults[arg]:
parser_default = _c_parser_defaults[arg]
if result.get(arg, parser_default) != parser_default:
warnings.warn("The '{arg}' argument has been deprecated "
"and will be removed in a future version"
.format(arg=arg), FutureWarning, stacklevel=2)
Expand Down
2 changes: 0 additions & 2 deletions pandas/io/tests/parser/test_parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,12 @@ def read_csv(self, *args, **kwds):
kwds = kwds.copy()
kwds['engine'] = self.engine
kwds['low_memory'] = self.low_memory
kwds['buffer_lines'] = 2
return read_csv(*args, **kwds)

def read_table(self, *args, **kwds):
kwds = kwds.copy()
kwds['engine'] = self.engine
kwds['low_memory'] = True
kwds['buffer_lines'] = 2
return read_table(*args, **kwds)


Expand Down
5 changes: 5 additions & 0 deletions pandas/io/tests/parser/test_unsupported.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ def test_deprecated_args(self):

# deprecated arguments with non-default values
deprecated = {
'buffer_lines': True,
'compact_ints': True,
'use_unsigned': True,
}
Expand All @@ -132,6 +133,10 @@ def test_deprecated_args(self):

for engine in engines:
for arg, non_default_val in deprecated.items():
if engine == 'python' and arg == 'buffer_lines':
# unsupported --> exception is raised first
continue

with tm.assert_produces_warning(
FutureWarning, check_stacklevel=False):
kwargs = {arg: non_default_val}
Expand Down