-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/MAINT: Change default of inplace to False in pd.eval #16732
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,6 @@ | |
"""Top level ``eval`` module. | ||
""" | ||
|
||
import warnings | ||
import tokenize | ||
from pandas.io.formats.printing import pprint_thing | ||
from pandas.core.computation import _NUMEXPR_INSTALLED | ||
|
@@ -148,7 +147,7 @@ def _check_for_locals(expr, stack_level, parser): | |
|
||
def eval(expr, parser='pandas', engine=None, truediv=True, | ||
local_dict=None, global_dict=None, resolvers=(), level=0, | ||
target=None, inplace=None): | ||
target=None, inplace=False): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Defer to @jreback, @jorisvandenbossche, but I am still of the opinion that API would be cleaner if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @chris-b1 : True, but "with a That's something I imagined would create unnecessary overhead. Also, if you go down this path, you should then check if there is item assignment in the first place if you want That's why I decided to allow the user to pass in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I said before, I don't think we need to check the modifyability of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, sure, I have no problem leaving it as is. |
||
"""Evaluate a Python expression as a string using various backends. | ||
|
||
The following arithmetic operations are supported: ``+``, ``-``, ``*``, | ||
|
@@ -205,20 +204,40 @@ def eval(expr, parser='pandas', engine=None, truediv=True, | |
level : int, optional | ||
The number of prior stack frames to traverse and add to the current | ||
scope. Most users will **not** need to change this parameter. | ||
target : a target object for assignment, optional, default is None | ||
essentially this is a passed in resolver | ||
inplace : bool, default True | ||
If expression mutates, whether to modify object inplace or return | ||
copy with mutation. | ||
|
||
WARNING: inplace=None currently falls back to to True, but | ||
in a future version, will default to False. Use inplace=True | ||
explicitly rather than relying on the default. | ||
target : object, optional, default None | ||
This is the target object for assignment. It is used when there is | ||
variable assignment in the expression. If so, then `target` must | ||
support item assignment with string keys, and if a copy is being | ||
returned, it must also support `.copy()`. | ||
inplace : bool, default False | ||
If `target` is provided, and the expression mutates `target`, whether | ||
to modify `target` inplace. Otherwise, return a copy of `target` with | ||
the mutation. | ||
|
||
Returns | ||
------- | ||
ndarray, numeric scalar, DataFrame, Series | ||
|
||
Raises | ||
------ | ||
ValueError | ||
There are many instances where such an error can be raised: | ||
|
||
- `target=None`, but the expression is multiline. | ||
- The expression is multiline, but not all them have item assignment. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. expand what this means (for 2nd item) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, done. |
||
An example of such an arrangement is this: | ||
|
||
a = b + 1 | ||
a + 2 | ||
|
||
Here, there are expressions on different lines, making it multiline, | ||
but the last line has no variable assigned to the output of `a + 2`. | ||
- `inplace=True`, but the expression is missing item assignment. | ||
- Item assignment is provided, but the `target` does not support | ||
string item assignment. | ||
- Item assignment is provided and `inplace=False`, but the `target` | ||
does not support the `.copy()` method | ||
|
||
Notes | ||
----- | ||
The ``dtype`` of any objects involved in an arithmetic ``%`` operation are | ||
|
@@ -232,8 +251,9 @@ def eval(expr, parser='pandas', engine=None, truediv=True, | |
pandas.DataFrame.query | ||
pandas.DataFrame.eval | ||
""" | ||
inplace = validate_bool_kwarg(inplace, 'inplace') | ||
first_expr = True | ||
|
||
inplace = validate_bool_kwarg(inplace, "inplace") | ||
|
||
if isinstance(expr, string_types): | ||
_check_expression(expr) | ||
exprs = [e.strip() for e in expr.splitlines() if e.strip() != ''] | ||
|
@@ -245,7 +265,10 @@ def eval(expr, parser='pandas', engine=None, truediv=True, | |
raise ValueError("multi-line expressions are only valid in the " | ||
"context of data, use DataFrame.eval") | ||
|
||
ret = None | ||
first_expr = True | ||
target_modified = False | ||
|
||
for expr in exprs: | ||
expr = _convert_expression(expr) | ||
engine = _check_engine(engine) | ||
|
@@ -266,28 +289,33 @@ def eval(expr, parser='pandas', engine=None, truediv=True, | |
eng_inst = eng(parsed_expr) | ||
ret = eng_inst.evaluate() | ||
|
||
if parsed_expr.assigner is None and multi_line: | ||
raise ValueError("Multi-line expressions are only valid" | ||
" if all expressions contain an assignment") | ||
if parsed_expr.assigner is None: | ||
if multi_line: | ||
raise ValueError("Multi-line expressions are only valid" | ||
" if all expressions contain an assignment") | ||
elif inplace: | ||
raise ValueError("Cannot operate inplace " | ||
"if there is no assignment") | ||
|
||
# assign if needed | ||
if env.target is not None and parsed_expr.assigner is not None: | ||
if inplace is None: | ||
warnings.warn( | ||
"eval expressions containing an assignment currently" | ||
"default to operating inplace.\nThis will change in " | ||
"a future version of pandas, use inplace=True to " | ||
"avoid this warning.", | ||
FutureWarning, stacklevel=3) | ||
inplace = True | ||
target_modified = True | ||
|
||
# if returning a copy, copy only on the first assignment | ||
if not inplace and first_expr: | ||
target = env.target.copy() | ||
try: | ||
target = env.target.copy() | ||
except AttributeError: | ||
raise ValueError("Cannot return a copy of the target") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we have a test for this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Absolutely. There is an entire test devoted to this! |
||
else: | ||
target = env.target | ||
|
||
target[parsed_expr.assigner] = ret | ||
# TypeError is most commonly raised (e.g. int, list), but you | ||
# get IndexError if you try to do this assignment on np.ndarray. | ||
try: | ||
target[parsed_expr.assigner] = ret | ||
except (TypeError, IndexError): | ||
raise ValueError("Cannot assign expression output to target") | ||
|
||
if not resolvers: | ||
resolvers = ({parsed_expr.assigner: ret},) | ||
|
@@ -304,7 +332,6 @@ def eval(expr, parser='pandas', engine=None, truediv=True, | |
ret = None | ||
first_expr = False | ||
|
||
if not inplace and inplace is not None: | ||
return target | ||
|
||
return ret | ||
# We want to exclude `inplace=None` as being False. | ||
if inplace is False: | ||
return target if target_modified else ret |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2224,21 +2224,18 @@ def query(self, expr, inplace=False, **kwargs): | |
else: | ||
return new_data | ||
|
||
def eval(self, expr, inplace=None, **kwargs): | ||
def eval(self, expr, inplace=False, **kwargs): | ||
"""Evaluate an expression in the context of the calling DataFrame | ||
instance. | ||
Parameters | ||
---------- | ||
expr : string | ||
The expression string to evaluate. | ||
inplace : bool | ||
If the expression contains an assignment, whether to return a new | ||
DataFrame or mutate the existing. | ||
WARNING: inplace=None currently falls back to to True, but | ||
in a future version, will default to False. Use inplace=True | ||
explicitly rather than relying on the default. | ||
inplace : bool, default False | ||
If the expression contains an assignment, whether to perform the | ||
operation inplace and mutate the existing DataFrame. Otherwise, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe make the "Otherwise, " -> "By default, " ? (to stress more that returning a new dataframe is the default) Or could also append something to the sentence. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure why we need to emphasize this. It's pretty clear that the default is |
||
a new DataFrame is returned. | ||
.. versionadded:: 0.18.0 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1311,14 +1311,6 @@ def assignment_not_inplace(self): | |
expected['c'] = expected['a'] + expected['b'] | ||
tm.assert_frame_equal(df, expected) | ||
|
||
# Default for inplace will change | ||
with tm.assert_produces_warnings(FutureWarning): | ||
df.eval('c = a + b') | ||
|
||
# but don't warn without assignment | ||
with tm.assert_produces_warnings(None): | ||
df.eval('a + b') | ||
|
||
def test_multi_line_expression(self): | ||
# GH 11149 | ||
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) | ||
|
@@ -1388,14 +1380,52 @@ def test_assignment_in_query(self): | |
df.query('a = 1') | ||
assert_frame_equal(df, df_orig) | ||
|
||
def query_inplace(self): | ||
# GH 11149 | ||
def test_query_inplace(self): | ||
# see gh-11149 | ||
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) | ||
expected = df.copy() | ||
expected = expected[expected['a'] == 2] | ||
df.query('a == 2', inplace=True) | ||
assert_frame_equal(expected, df) | ||
|
||
df = {} | ||
expected = {"a": 3} | ||
|
||
self.eval("a = 1 + 2", target=df, inplace=True) | ||
tm.assert_dict_equal(df, expected) | ||
|
||
@pytest.mark.parametrize("invalid_target", [1, "cat", [1, 2], | ||
np.array([]), (1, 3)]) | ||
def test_cannot_item_assign(self, invalid_target): | ||
msg = "Cannot assign expression output to target" | ||
expression = "a = 1 + 2" | ||
|
||
with tm.assert_raises_regex(ValueError, msg): | ||
self.eval(expression, target=invalid_target, inplace=True) | ||
|
||
if hasattr(invalid_target, "copy"): | ||
with tm.assert_raises_regex(ValueError, msg): | ||
self.eval(expression, target=invalid_target, inplace=False) | ||
|
||
@pytest.mark.parametrize("invalid_target", [1, "cat", (1, 3)]) | ||
def test_cannot_copy_item(self, invalid_target): | ||
msg = "Cannot return a copy of the target" | ||
expression = "a = 1 + 2" | ||
|
||
with tm.assert_raises_regex(ValueError, msg): | ||
self.eval(expression, target=invalid_target, inplace=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is a bit a pity that the copy check has to come first, because you now get a not super informative message when passing an invalid target (about not having copy, while you just can't pass it). (which would be the same if we wouldn't catch those, I know :-)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True, but I can also pass in objects that are constructed to support item assignment but fail on copy. It seems reasonable to put a custom error message for that case. |
||
|
||
@pytest.mark.parametrize("target", [1, "cat", [1, 2], | ||
np.array([]), (1, 3), {1: 2}]) | ||
def test_inplace_no_assignment(self, target): | ||
expression = "1 + 2" | ||
|
||
assert self.eval(expression, target=target, inplace=False) == 3 | ||
|
||
msg = "Cannot operate inplace if there is no assignment" | ||
with tm.assert_raises_regex(ValueError, msg): | ||
self.eval(expression, target=target, inplace=True) | ||
|
||
def test_basic_period_index_boolean_expression(self): | ||
df = mkdf(2, 2, data_gen_f=f, c_idx_type='p', r_idx_type='i') | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't -> doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's plural ("arrays"), so "don't" is the correct conjugation there.