Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: find solution for extension/tests_numpy.py (PandasArray base extension tests) #40021

Open
jorisvandenbossche opened this issue Feb 24, 2021 · 1 comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite

Comments

@jorisvandenbossche
Copy link
Member

We currently run the base extension tests also for the PandasArray (to have coverage of this array class), but because we actually don't allow to store this in a Series/DataFrame, the tests do some patching to workaround this:

def allow_in_pandas(monkeypatch):
"""
A monkeypatch to tells pandas to let us in.
By default, passing a PandasArray to an index / series / frame
constructor will unbox that PandasArray to an ndarray, and treat
it as a non-EA column. We don't want people using EAs without
reason.
The mechanism for this is a check against ABCPandasArray
in each constructor.
But, for testing, we need to allow them in pandas. So we patch
the _typ of PandasArray, so that we evade the ABCPandasArray
check.
"""
with monkeypatch.context() as m:
m.setattr(PandasArray, "_typ", "extension")
yield

However, this regularly gives problems, see eg #39991 (comment) or #39722 (comment), where for some reason those PandasArrays do / don't get preserved in a DataFrame/Series due to some other code changes that impact this, and the tests are not written to deal with it.
(one example: extract_array will convert a pandas Series with numpy dtype into a PandasArray, when the patch is enabled. This can then cause the result to include the PandasArray EA, while the expected result is not created that way)

So we should try to find a solution for this (as it gets quite annoying to deal with it just to get those tests passing). But I don't directly know what the best solution would be.
Some ideas:

  • Simply remove tests/extension/test_numpy.py and thus don't run the base extension interface tests for PandasArray. We still have tests/arrays/test_numpy.py for some other tests. The easiest solution, but would also remove test coverage.
  • In some way only run the tests in tests/extension/test_numpy.py that only deal with the array, and not with the array stored in DataFrame/Series.

cc @TomAugspurger @jbrockmendel

@jorisvandenbossche jorisvandenbossche added Testing pandas testing functions or related to the test suite ExtensionArray Extending pandas with custom dtypes or arrays. labels Feb 24, 2021
@jbrockmendel
Copy link
Member

In some way only run the tests in tests/extension/test_numpy.py that only deal with the array, and not with the array stored in DataFrame/Series.

This seems reasonable to me. Unless a) we have some plan to start allowing PandasArray in Blocks or b) we think some significant portion of the PandasArray APi is only tested in the Series/DataFrame tests (in which case we should likely refactor the tests to include more direct testing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

2 participants