Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST/REF: refactor the arithmetic tests for IntegerArray #34454

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented May 29, 2020

This is an attempt to make the arithmetic tests for the masked arrays more understandable and maintainable (doing it here for IntegerArray as a start, but the idea would be to do the same for BooleanArray, and later FloatingArray as well, and at that point also share some of those tests).

The problem with the current arrays/integer/test_arithmetic.py is that is quite complex to see what is going on (which I experienced when making a version for FloatingArray in #34307): for example, there is a huge _check_op method that has all the logic to created the expected result, but so this also has all the special cases of all ops combined, making it very difficult to see what is going on or to know what is now exactly tested for a certain op.
The reason for this structure is that those tests originally came from the base extension tests in tests/extension/, where the tests needed to be very generic. However, we already moved out those tests for IntgerArray, since they were all customized (nothing was still being inherited from the base class), but so we also don't need to maintain the original class structure then.

The logic how I constructed the new tests:

  • I first test each op with an explicitly constructed expected result. Here, we test the IntegerArray-specific special cases (like things as 1 ** pd.NA, or division resulting in a float numpy array, ..). Explicitly writing it down makes it a lot easier to read and to see what is tested (no complex if op == ...: .., constructs, and no complex correcting of expected results based on ndarray)
  • Those first tests are all with EAs. In a next set of tests, I added tests for array+scalar ops, array+ndarray ops, ... But here, I just rely on creating the expected result with the EA itself (since EA+EA ops were already tested before)
  • Similar tests for frame+scalar, series+scalar, series+array with the EA dtypes -> rely on the actual EA op for the expected result instead of having _check_op handle that as well.

cc @dsaxton @jreback

@jorisvandenbossche jorisvandenbossche added Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite NA - MaskedArrays Related to pd.NA and nullable extension arrays labels May 29, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone May 29, 2020
@dsaxton
Copy link
Member

dsaxton commented May 29, 2020

Nice @jorisvandenbossche, this looks much easier to follow / understand

@jorisvandenbossche
Copy link
Member Author

All good here?
cc @TomAugspurger @jbrockmendel

@jreback
Copy link
Contributor

jreback commented Jun 3, 2020

yep looks good, pls rebase and merge on green.

@TomAugspurger
Copy link
Contributor

Have people seen these test failures recently:

________________________ test_on_offset_implementations ________________________

[gw0] linux -- Python 3.7.7 /home/travis/miniconda3/envs/pandas-dev/bin/python

    @given(gen_random_datetime, gen_yqm_offset)

>   def test_on_offset_implementations(dt, offset):

pandas/tests/tseries/offsets/test_offsets_properties.py:89: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

pandas/tests/tseries/offsets/test_offsets_properties.py:94: in test_on_offset_implementations

    compare = (dt + offset) - offset

pandas/_libs/tslibs/offsets.pyx:440: in pandas._libs.tslibs.offsets.BaseOffset.__add__

    return other.__add__(self)

pandas/_libs/tslibs/offsets.pyx:442: in pandas._libs.tslibs.offsets.BaseOffset.__add__

    return self.apply(other)

pandas/_libs/tslibs/offsets.pyx:136: in pandas._libs.tslibs.offsets.apply_wraps.wrapper

    result = result.tz_localize(tz)

pandas/_libs/tslibs/timestamps.pyx:1272: in pandas._libs.tslibs.timestamps.Timestamp.tz_localize

    value = tz_localize_to_utc(np.array([self.value], dtype='i8'), tz,

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   raise pytz.NonExistentTimeError(stamp)

E   pytz.exceptions.NonExistentTimeError: 1905-06-01 00:00:00

pandas/_libs/tslibs/tzconversion.pyx:276: NonExistentTimeError

---------------------------------- Hypothesis ----------------------------------

Falsifying example: test_on_offset_implementations(

    dt=datetime.datetime(1900, 1, 1, 0, 0, tzinfo=tzfile('/usr/share/zoneinfo/Asia/Singapore')),

    offset=<65 * MonthBegins>,

)

And the arm build timed out.

@jbrockmendel
Copy link
Member

Have people seen these test failures recently: [...] test_on_offset_implementations

Yes. I've been ignoring them until i can get around to them

@jorisvandenbossche
Copy link
Member Author

@TomAugspurger open an issue for that?

@jorisvandenbossche jorisvandenbossche merged commit 6258397 into pandas-dev:master Jun 6, 2020
@jorisvandenbossche jorisvandenbossche deleted the masked-arithmetic-tests branch June 6, 2020 08:29
@jorisvandenbossche
Copy link
Member Author

Will do a follow-up doing the same for BooleanArray, and then deduplicating some of them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants