Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ChainedAssignmentError for CoW not working for chained inplace methods when passing *args or **kwargs #56456

Open
Tracked by #48998
jorisvandenbossche opened this issue Dec 11, 2023 · 1 comment

Comments

@jorisvandenbossche
Copy link
Member

While working on #56402 (ensuring full test coverage for the warning we raise when you do chained inplace methods), we ran into another corner case where the refcount differs, and mixes up the refcount-based inference about whether we are being called in a "chained" context and thus have to raise the ChainedAssignment warning.
(a previous case we discovered was when the setitem call is coming from cython code: #51315)

Specifically, when passing *args or **kwargs to the inplace method call, it takes a different execution path in the Python interpreter compared to calling it with manually specified arguments. And starting from Python 3.11, this other execution path gives one reference less.

Example (with CoW, the df will never be updated with such chained method call):

>>> pd.options.mode.copy_on_write = True
>>> df = DataFrame({"a": [1, 2, np.nan], "b": 1})
>>> df["a"].fillna(0, inplace=True)
ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
When using the Copy-on-Write mode, such inplace method never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' instead, to perform the operation inplace on the original object.

That warning works for all tested Python versions. However, when passing the args or kwargs, the warning doesn't work on Python >= 3.11:

>>> pd.options.mode.copy_on_write = True
>>> df = DataFrame({"a": [1, 2, np.nan], "b": 1})
>>> args = (0, )
>>> df["a"].fillna(*args, inplace=True)
# no warning

For now we decided to punt on this specific corner case, but creating this issue to we keep track of what we learned and have something to reference to when this might come up later.

@jorisvandenbossche
Copy link
Member Author

Some more details on the inner Python interpreter details about why this is happening: when calling the function with *args or **kwargs, i.e. with a variable number of arguments, the opcode CALL_FUNCTION_EX is being used.
When calling the function in a standard way with specified keywords, prior to Python 3.11 the opcode CALL_FUNCTION_KW was being used. But with Python 3.11, this has been optimized, and this opcode is replaced with KW_NAMES + CALL opcodes. And apparently this optimized implementation results in less references to the object on which the method is called.

Example disassembly (opcodes) for Python 3.10 vs 3.11 for the above fillna call Python 3.10 (both cases have ref count of 4):
In [7]: dis.dis(lambda: df['a'].fillna(1, inplace=True))
  1           0 LOAD_GLOBAL              0 (df)
              2 LOAD_CONST               1 ('a')
              4 BINARY_SUBSCR
              6 LOAD_ATTR                1 (fillna)
              8 LOAD_CONST               2 (1)
             10 LOAD_CONST               3 (True)
             12 LOAD_CONST               4 (('inplace',))
             14 CALL_FUNCTION_KW         2
             16 RETURN_VALUE

In [8]: dis.dis(lambda: df['a'].fillna(*args, inplace=True))
  1           0 LOAD_GLOBAL              0 (df)
              2 LOAD_CONST               1 ('a')
              4 BINARY_SUBSCR
              6 LOAD_ATTR                1 (fillna)
              8 LOAD_GLOBAL              2 (args)
             10 LOAD_CONST               2 ('inplace')
             12 LOAD_CONST               3 (True)
             14 BUILD_MAP                1
             16 CALL_FUNCTION_EX         1
             18 RETURN_VALUE

Python 3.11:

In [8]: dis.dis(lambda: df['a'].fillna(*args, inplace=True))
  1           0 RESUME                   0
              2 LOAD_GLOBAL              1 (NULL + df)
             14 LOAD_CONST               1 ('a')
             16 BINARY_SUBSCR
             26 LOAD_ATTR                1 (fillna)
             36 LOAD_GLOBAL              4 (args)
             48 LOAD_CONST               2 ('inplace')
             50 LOAD_CONST               3 (True)
             52 BUILD_MAP                1
             54 CALL_FUNCTION_EX         1
             56 RETURN_VALUE

In [9]: dis.dis(lambda: df['a'].fillna(1, inplace=True))
  1           0 RESUME                   0
              2 LOAD_GLOBAL              0 (df)
             14 LOAD_CONST               1 ('a')
             16 BINARY_SUBSCR
             26 LOAD_METHOD              1 (fillna)
             48 LOAD_CONST               2 (1)
             50 LOAD_CONST               3 (True)
             52 KW_NAMES                 4
             54 PRECALL                  2
             58 CALL                     2
             68 RETURN_VALUE

We actually noticed this different ref count for the standard case, and therefore use a different threshold for Python 3.11+:

REF_COUNT = 2 if PY311 else 3

But that means that for the args/kwargs use case, we incorrectly don't trigger a warning.

If we want, we probably would be able to inspect the stack to try to figure out whether the method was called with args/kwargs or not, if we are OK with paying the code complexity for this and the runtime overhead. Short-term we probably won't include this, but if it turns out to be a more common case, we can always try that in the future.
I experimented a little bit with this, so posting here the code for posterity.

def _is_using_args_kwargs(code_object, method):
    # simple case of df.method(..) without args/kwargs
    for instr in dis.get_instructions(code_object):
        if instr.opname == "LOAD_METHOD" and instr.argval == method:
            print("we have LOAD_METHOD (fillna)")
            return False

    # fallback for more complex cases
    instructions = list(dis.get_instructions(code_object))
    instr_call_ex = [i for i, instr in enumerate(instructions) if instr.opname == "CALL_FUNCTION_EX"]
    if not instr_call_ex:
        print("we have no CALL_FUNCTION_EX")
        return False

    method_with_ex = False
    for case in instr_call_ex:
        for i in range(case, 0, -1):
            if instructions[i].opname == "LOAD_ATTR":
                method_with_ex = instructions[i].argval == method
                if method_with_ex:
                    print("we have CALL_FUNCTION_EX with prepending LOAD_ATTR (fillna)")
                break
        if method_with_ex:
            return True
    
    return method_with_ex

and this works as

In [6]: import dis

In [7]: d = dis.Bytecode("df['a'].fillna(*args, inplace=True)")

In [10]: _is_using_args_kwargs(d.codeobj, method="fillna")
Out[10]: True

In [11]: d = dis.Bytecode("df['a'].fillna(0, inplace=True)")

In [12]: _is_using_args_kwargs(d.codeobj, method="fillna")
Out[12]: False

In [13]: d = dis.Bytecode("df['a'].fillna(0, inplace=True).reset_index(*args)")

In [14]: _is_using_args_kwargs(d.codeobj, method="fillna")
Out[14]: False

For the actual method, it could be integrated with something like:

--- a/pandas/core/generic.py
+++ b/pandas/core/generic.py
@@ -6,6 +6,7 @@ from copy import deepcopy
 import datetime as dt
 from functools import partial
 import gc
+import inspect
 from json import loads
 import operator
 import pickle
@@ -7304,11 +7306,20 @@ class NDFrame(PandasObject, indexing.IndexingMixin):
                 and self._is_view_after_cow_rules()
             ):
                 ctr = sys.getrefcount(self)
                 ref_count = REF_COUNT
                 if isinstance(self, ABCSeries) and _check_cacher(self):
                     # see https://github.com/pandas-dev/pandas/pull/56060#discussion_r1399245221
                     ref_count += 1
+                if PY311:
+                    calling_frame = inspect.currentframe().f_back
+                    if _is_using_args_kwargs(calling_frame.f_code):
+                        ref_count += 1
                 if ctr <= ref_count:
                     warnings.warn(
                         _chained_assignment_warning_method_msg,
                         FutureWarning,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant