-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: ufunc with DataFrame input not passing all kwargs #40878
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment, otherwise lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mzeitlin11 for the PR. we want to make sure that the return type is unchanged.
pandas/tests/frame/test_ufunc.py
Outdated
func(df, 1, out=result) | ||
|
||
expected = np.array(expected).reshape(2, 2) | ||
tm.assert_numpy_array_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test the return type/values of the operation itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mzeitlin11 lgtm
doc/source/whatsnew/v1.2.4.rst
Outdated
@@ -21,6 +21,7 @@ Fixed regressions | |||
- Fixed regression in :meth:`DataFrame.where` not returning a copy in the case of an all True condition (:issue:`39595`) | |||
- Fixed regression in :meth:`DataFrame.replace` raising ``IndexError`` when ``regex`` was a multi-key dictionary (:issue:`39338`) | |||
- Fixed regression in repr of floats in an ``object`` column not respecting ``float_format`` when printed in the console or outputted through :meth:`DataFrame.to_string`, :meth:`DataFrame.to_html`, and :meth:`DataFrame.to_latex` (:issue:`40024`) | |||
- Fixed regression in ``numpy`` ufuncs such as ``np.add`` not passing through all arguments for 2-dimensional input (:issue:`40662`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fixed regression in ``numpy`` ufuncs such as ``np.add`` not passing through all arguments for 2-dimensional input (:issue:`40662`) | |
- Fixed regression in NumPy ufuncs such as ``np.add`` not passing through all arguments for 2-dimensional input (:issue:`40662`) |
and maybe replace 2-dimensional input with :class:DataFrame
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep will do
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
lgtm Ci failure is because of numpy dev issue |
@@ -357,7 +357,7 @@ def reconstruct(result): | |||
# * len(inputs) > 1 is doable when we know that we have | |||
# aligned blocks / dtypes. | |||
inputs = tuple(np.asarray(x) for x in inputs) | |||
result = getattr(ufunc, method)(*inputs) | |||
result = getattr(ufunc, method)(*inputs, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the else clause is reached for a DataFrame when not (len(inputs) > 1 or ufunc.nout > 1)
.
The result = mgr.apply(getattr(ufunc, method))
there for __call__
also fails to pass along **kwargs
Is it straightforward to add to the paramterised here to exercise that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think so...will let you know if I run into any issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test, but had to xfail because out
is not written to correctly. The written result becomes transposed with the block-wise apply, I'm guessing because of the following relationship:
arr = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(arr)
print(df.values)
print(df._mgr.blocks[0].values)
gives
[[1 2]
[3 4]]
[[1 3]
[2 4]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result = func(df, arg, out=result_inplace) | ||
|
||
expected = np.array(expected).reshape(2, 2) | ||
tm.assert_numpy_array_equal(result_inplace, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any documentation that out is actually a ndarray? this is a very strange result. At the very least document this, and let's open an issue. This should work with out=DataFrame, or simply raise (preferred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. I thought this was strange. #40662 (comment)
should we defer to 1.2.5 to allow more discussion? or put 1.2.4 release on hold for a day or two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's defer this. i dont' think this is the correct behavior and we should actually fix it (which may mean that we simply do this for 1.3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving to 1.2.5 (if we do it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work with out=DataFrame, or simply raise (preferred)
This does raise when passing a DataFrame to out
(as numpy expects an array as out
argument), but what is being tested here is the return value of the ufunc in case of passing an array to out
. In that case, out
is also being returned, and since out
is an ndarray, the return value is also an ndarray.
(to me this seems correct behaviour, and thus I don't think this needs to hold up merging this for the release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "terrible" behaviour is long-standing (and IMO correct) behaviour on which users rely.
how is this correct in any way? this was a bug from the original impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was not a bug, that's how numpy functions work: they coerce array-like input to arrays. So whether one of the arguments was a DataFrame vs an ndarray, did not have any effect on allowing an out
argument or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fine. how about a follow to make this really clear (in docs) on master. I would however also deprecate / remove this as it not intuitive at all (e.g. we do not have an out argument anywhere else)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. we do not have an out argument anywhere else
This keyword is not in a pandas function or method, but in a NumPy function (which has out
in many places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know, but this is very confusing if someone is doing this (and doesnt; realize it).
pandas/core/arraylike.py
Outdated
@@ -367,7 +367,7 @@ def reconstruct(result): | |||
if method == "__call__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if method == "__call__": | |
if method == "__call__" and kwargs.get("out", None) is None: |
(untested, but if out
is specified, we should never call mgr.apply
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe more simply check for any kwargs
, so only do this if kwargs
is empty (so and not kwargs
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly pushed this edit myself, as it fixes the xfail case in the tests you added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Thanks @mzeitlin11 ! |
@meeseeksdev backport 1.2.x |
…ssing all kwargs
…kwargs (#40895) Co-authored-by: Matthew Zeitlin <37011898+mzeitlin11@users.noreply.github.com>
…0878) Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Credit to @attack68 for the fix, just figured we should try to get this is in for 1.2.4 if possible since we know the fix.