Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArrayManager] TST: arithmetic test #39753

Merged
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,4 @@ jobs:
run: |
source activate pandas-dev
pytest pandas/tests/frame/methods --array-manager
pytest pandas/tests/arithmetic/ --array-manager
4 changes: 3 additions & 1 deletion pandas/_testing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,10 @@ def box_expected(expected, box_cls, transpose=True):
if transpose:
# for vector operations, we need a DataFrame to be a single-row,
# not a single-column, in order to operate against non-DataFrame
# vectors of the same length.
# vectors of the same length. But convert to two rows to avoid
# single-row special cases in datetime arithmetic
expected = expected.T
expected = pd.concat([expected] * 2, ignore_index=True)
jreback marked this conversation as resolved.
Show resolved Hide resolved
elif box_cls is PeriodArray:
# the PeriodArray constructor is not as flexible as period_array
expected = period_array(expected)
Expand Down
40 changes: 15 additions & 25 deletions pandas/tests/arithmetic/test_datetime64.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,40 +318,40 @@ def test_dt64arr_timestamp_equality(self, box_with_array):
box_with_array if box_with_array not in [pd.Index, pd.array] else np.ndarray
)

ser = Series([Timestamp("2000-01-29 01:59:00"), "NaT"])
ser = Series([Timestamp("2000-01-29 01:59:00"), Timestamp("2000-01-30"), "NaT"])
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
ser = tm.box_expected(ser, box_with_array)

result = ser != ser
expected = tm.box_expected([False, True], xbox)
expected = tm.box_expected([False, False, True], xbox)
tm.assert_equal(result, expected)

warn = FutureWarning if box_with_array is pd.DataFrame else None
with tm.assert_produces_warning(warn):
# alignment for frame vs series comparisons deprecated
result = ser != ser[0]
expected = tm.box_expected([False, True], xbox)
expected = tm.box_expected([False, True, True], xbox)
tm.assert_equal(result, expected)

with tm.assert_produces_warning(warn):
# alignment for frame vs series comparisons deprecated
result = ser != ser[1]
expected = tm.box_expected([True, True], xbox)
result = ser != ser[2]
expected = tm.box_expected([True, True, True], xbox)
tm.assert_equal(result, expected)

result = ser == ser
expected = tm.box_expected([True, False], xbox)
expected = tm.box_expected([True, True, False], xbox)
tm.assert_equal(result, expected)

with tm.assert_produces_warning(warn):
# alignment for frame vs series comparisons deprecated
result = ser == ser[0]
expected = tm.box_expected([True, False], xbox)
expected = tm.box_expected([True, False, False], xbox)
tm.assert_equal(result, expected)

with tm.assert_produces_warning(warn):
# alignment for frame vs series comparisons deprecated
result = ser == ser[1]
expected = tm.box_expected([False, False], xbox)
result = ser == ser[2]
expected = tm.box_expected([False, False, False], xbox)
tm.assert_equal(result, expected)


Expand Down Expand Up @@ -1010,10 +1010,7 @@ def test_dt64arr_sub_dt64object_array(self, box_with_array, tz_naive_fixture):
obj = tm.box_expected(dti, box_with_array)
expected = tm.box_expected(expected, box_with_array)

warn = None
if box_with_array is not pd.DataFrame or tz_naive_fixture is None:
warn = PerformanceWarning
with tm.assert_produces_warning(warn):
with tm.assert_produces_warning(PerformanceWarning):
result = obj - obj.astype(object)
tm.assert_equal(result, expected)

Expand Down Expand Up @@ -1276,7 +1273,7 @@ def test_dt64arr_add_sub_relativedelta_offsets(self, box_with_array):
]
)
vec = tm.box_expected(vec, box_with_array)
vec_items = vec.squeeze() if box_with_array is pd.DataFrame else vec
jreback marked this conversation as resolved.
Show resolved Hide resolved
vec_items = vec.iloc[0] if box_with_array is pd.DataFrame else vec

# DateOffset relativedelta fastpath
relative_kwargs = [
Expand Down Expand Up @@ -1401,7 +1398,7 @@ def test_dt64arr_add_sub_DateOffsets(
]
)
vec = tm.box_expected(vec, box_with_array)
vec_items = vec.squeeze() if box_with_array is pd.DataFrame else vec
vec_items = vec.iloc[0] if box_with_array is pd.DataFrame else vec

offset_cls = getattr(pd.offsets, cls_name)

Expand Down Expand Up @@ -1515,10 +1512,7 @@ def test_dt64arr_add_sub_offset_array(
if box_other:
other = tm.box_expected(other, box_with_array)

warn = PerformanceWarning
if box_with_array is pd.DataFrame and tz is not None:
warn = None
with tm.assert_produces_warning(warn):
with tm.assert_produces_warning(PerformanceWarning):
res = op(dtarr, other)

tm.assert_equal(res, expected)
Expand Down Expand Up @@ -2459,18 +2453,14 @@ def test_dti_addsub_object_arraylike(
expected = DatetimeIndex(["2017-01-31", "2017-01-06"], tz=tz_naive_fixture)
expected = tm.box_expected(expected, xbox)

warn = PerformanceWarning
if box_with_array is pd.DataFrame and tz is not None:
warn = None

with tm.assert_produces_warning(warn):
with tm.assert_produces_warning(PerformanceWarning):
result = dtarr + other
tm.assert_equal(result, expected)

expected = DatetimeIndex(["2016-12-31", "2016-12-29"], tz=tz_naive_fixture)
expected = tm.box_expected(expected, xbox)

with tm.assert_produces_warning(warn):
with tm.assert_produces_warning(PerformanceWarning):
result = dtarr - other
tm.assert_equal(result, expected)

Expand Down
8 changes: 5 additions & 3 deletions pandas/tests/arithmetic/test_numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -532,13 +532,15 @@ def test_df_div_zero_series_does_not_commute(self):
# ------------------------------------------------------------------
# Mod By Zero

def test_df_mod_zero_df(self):
def test_df_mod_zero_df(self, using_array_manager):
# GH#3590, modulo as ints
df = pd.DataFrame({"first": [3, 4, 5, 8], "second": [0, 0, 0, 3]})

# this is technically wrong, as the integer portion is coerced to float
# ###
first = Series([0, 0, 0, 0], dtype="float64")
first = Series([0, 0, 0, 0])
if not using_array_manager:
# BlockManager doesn't preserve dtype per column if possible
jreback marked this conversation as resolved.
Show resolved Hide resolved
first = first.astype("float64")
second = Series([np.nan, np.nan, np.nan, 0])
expected = pd.DataFrame({"first": first, "second": second})
result = df % df
Expand Down
19 changes: 17 additions & 2 deletions pandas/tests/arithmetic/test_timedelta64.py
Original file line number Diff line number Diff line change
Expand Up @@ -1748,7 +1748,9 @@ def test_tdarr_div_length_mismatch(self, box_with_array):
# ------------------------------------------------------------------
# __floordiv__, __rfloordiv__

def test_td64arr_floordiv_td64arr_with_nat(self, box_with_array):
def test_td64arr_floordiv_td64arr_with_nat(
self, box_with_array, using_array_manager
):
# GH#35529
box = box_with_array
xbox = np.ndarray if box is pd.array else box
Expand All @@ -1761,6 +1763,8 @@ def test_td64arr_floordiv_td64arr_with_nat(self, box_with_array):

expected = np.array([1.0, 1.0, np.nan], dtype=np.float64)
expected = tm.box_expected(expected, xbox)
if box is DataFrame and using_array_manager:
expected[[0, 1]] = expected[[0, 1]].astype("int64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is expected different and is one behavior more desirable than the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayManager performs the op column-wise, and thus can preserve dtypes (floordiv gives ints). Will add a comment about it in the line of https://github.com/pandas-dev/pandas/pull/39753/files#r576350947

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks


result = left // right

Expand Down Expand Up @@ -2040,7 +2044,9 @@ def test_td64arr_rmul_numeric_array(self, box_with_array, vector, any_real_dtype
[np.array([20, 30, 40]), pd.Index([20, 30, 40]), Series([20, 30, 40])],
ids=lambda x: type(x).__name__,
)
def test_td64arr_div_numeric_array(self, box_with_array, vector, any_real_dtype):
def test_td64arr_div_numeric_array(
self, box_with_array, vector, any_real_dtype, using_array_manager
):
# GH#4521
# divide/multiply by integers
xbox = get_upcast_box(box_with_array, vector)
Expand Down Expand Up @@ -2075,6 +2081,15 @@ def test_td64arr_div_numeric_array(self, box_with_array, vector, any_real_dtype)
expected = [tdser[n] / vector[n] for n in range(len(tdser))]
expected = pd.Index(expected) # do dtype inference
expected = tm.box_expected(expected, xbox)

if using_array_manager and box_with_array is pd.DataFrame:
# TODO the behaviour is buggy here (third column with all-NaT
# as result doesn't get preserved as timedelta64 dtype).
# Reported at https://github.com/pandas-dev/pandas/issues/39750
# Changing the expected instead of xfailing to continue to test
# the correct behaviour for the other columns
expected[2] = Series([pd.NaT, pd.NaT], dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sub-optimal. can you add a TODO to try to retain the correct dtype

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand your comment. Of course it's sub-optimal, as it's a bug (but an existing one, and I opened an issue about it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but an existing one

now im confused; we only get here if using_array_manager

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. if the expected isnt actually what we want, better to xfail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now im confused; we only get here if using_array_manager

Yes, I "uncovered" the bug by adding array manager tests, but the bug itself is not related to ArrayManager, it's a bug in TimedeltaArray (and you can run into it without ArrayManager as well, just not covered by any test)

But indeed, an xfail might be more appropriate since I am now asserting the buggy behaviour. The problem is that I would then no longer test the other parts of this test (the other columns) which is actually working fine.
So I would prefer to keep it this way (but will update the comment to make it more clear this is buggy)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok can you open an issue to address this in BM. (and ideally an INFO / TODO) for future reference. I am getting increasinly worried that things are changed, but we have no idea where except for PR comments, which are not very useful forward looking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, and already included in the comment as well, I already opened an issue (it's #39750)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you can run into it without ArrayManager as well, just not covered by any test

OK. Can you add a non-ArrayManager test that will hit it?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added additional examples that don't involve ArrayManager to the issue (#39750)


tm.assert_equal(result, expected)

with pytest.raises(TypeError, match=pattern):
Expand Down