Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split test_categorical into subpackage (#18497) #18508

Merged
merged 7 commits into from
Dec 8, 2017

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Nov 26, 2017

Speaking to the methodology here, I created separate files mirroring what was provided in the other test packages. Within each file, there is one class for standard categorical tests and one for "block" tests, owing back to how this file is currently structured.

Because there were setup functions for only a very limited number of test cases, rather than creating those as separate classes within each module I lumped all of them into one test_generic.py file.

@jreback
Copy link
Contributor

jreback commented Nov 27, 2017

you need to add an entry in setup.py FYI as we don't pick up the test sub-dirs automatically (maybe we should)

@jreback jreback added Categorical Categorical Data Type Testing pandas testing functions or related to the test suite labels Nov 27, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

right=False, labels=cat_labels)
self.cat = df

def test_basic(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should all move to series or frame / test_constructor

def test_groupby_sort(self):

# http://stackoverflow.com/questions/23814368/sorting-pandas-categorical-labels-after-groupby
# This should result in a properly sorted Series so that the plot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to groupby


subf = self.factor[[0, 1, 2]]
tm.assert_numpy_array_equal(subf._codes,
np.array([0, 1, 1], dtype=np.int8))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move getitem / setitem to test_indexing

class TestCategoricalBlockOps(object):

def test_comparisons(self):
tests_data = [(list("abc"), list("cba"), list("bbb")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parametrize this

pytest.raises(TypeError, lambda: a < cat)
pytest.raises(TypeError, lambda: a < cat_rev)

# unequal comparison should raise for unordered cats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break from here in another test function

Copy link
Member Author

@WillAyd WillAyd Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of splitting at line 105 into a method called test_unequal_comparison_raises_type_error and subsequently at line 133 into a function called test_nan_equality - do you think the latter should stay in this file or get moved to test_missing.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest could still be here, that's more of a comparison test

res = cat_rev > "b"
tm.assert_numpy_array_equal(res, exp)

def test_print(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to test_repr

@codecov
Copy link

codecov bot commented Nov 27, 2017

Codecov Report

Merging #18508 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18508      +/-   ##
==========================================
- Coverage   91.35%   91.31%   -0.05%     
==========================================
  Files         163      163              
  Lines       49801    49796       -5     
==========================================
- Hits        45496    45469      -27     
- Misses       4305     4327      +22
Flag Coverage Δ
#multiple 89.11% <ø> (-0.03%) ⬇️
#single 40.79% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.44% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/tseries/offsets.py 96.94% <0%> (-0.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 674fb96...574e6c3. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 27, 2017

Codecov Report

Merging #18508 into master will decrease coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18508      +/-   ##
==========================================
- Coverage   91.59%   91.55%   -0.04%     
==========================================
  Files         153      155       +2     
  Lines       51257    51255       -2     
==========================================
- Hits        46949    46929      -20     
- Misses       4308     4326      +18
Flag Coverage Δ
#multiple 89.42% <ø> (-0.02%) ⬇️
#single 40.67% <ø> (-0.12%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.44% <0%> (-3.08%) ⬇️
pandas/core/config_init.py 98.34% <0%> (-0.12%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/plotting/_timeseries.py 60.73% <0%> (-0.1%) ⬇️
pandas/tseries/offsets.py 96.9% <0%> (-0.05%) ⬇️
pandas/core/indexes/datetimes.py 95.68% <0%> (ø) ⬆️
pandas/plotting/_compat.py 62% <0%> (ø) ⬆️
pandas/io/parquet.py 65.38% <0%> (ø) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9629fef...5e9234a. Read the comment docs.

pytest.raises(TypeError, lambda: cat_rev > a)

# The following work via '__array_priority__ = 1000'
# works only on numpy >= 1.7.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is always true now (you can remove the LooseVersion check), we are always on >= 1.9

class TestCategoricalBlockDtypes(object):

def test_dtypes(self):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could move this to reshape/test_concat.py. generaly philosphy is to put the same types of testing together and not segregate by dtype.

lambda x: x.astype('object').astype(Categorical)]:
pytest.raises(TypeError, lambda: invalid(s))

def test_numeric_like_ops(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can move to test_operators (below)



class TestCategoricalBlockGroubpy(TestCategoricalBlock):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, meant this to move to pandas/tests/groupby/test_categorical.py

tm.assert_numpy_array_equal(result, np.array([5], dtype='int8'))

def test_set_categories(self):
cat = Categorical(["a", "b", "c", "a"], ordered=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_categories testing goes in test_api

df = DataFrame(Series(cat))

def test_categorical_frame(self):
# normal DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mvoe to tests/frame/test_constructor

labels=labels)

def test_assignment_to_dataframe(self):
# assignment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to tests/frame/test_indexing

tm.assert_almost_equal(results, list(sorted(set(ok_for_cat))))

@pytest.mark.parametrize(
"dtype",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the drop_duplicates to pandas/tests/series/tests_analytics

tm.assert_index_equal(s.cat.categories, Index(["a"]))

def test_sequence_like(self):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to pandas/tests/frame/test_api

class TestBlockCategoricalAPI(object):

def test_reshaping(self):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to pandas/tests/reshape/test_reshape

@WillAyd
Copy link
Member Author

WillAyd commented Nov 30, 2017

Do you think it makes sense to move the block constructors into the series and frame construction tests as well or would you rather keep them here?

@jreback
Copy link
Contributor

jreback commented Dec 1, 2017

yes i would ideally like to live any testing to series/frame as appropriate
could add a series/test_categorical (and frame too)

we already segregate some testing by type somewhat (test_timestanp/timedelta) and not solely by function (in series/frame)

@pep8speaks
Copy link

pep8speaks commented Dec 3, 2017

Hello @WillAyd! Thanks for updating the PR.

Line 1604:9: E741 ambiguous variable name 'l'
Line 1609:9: E741 ambiguous variable name 'l'

Line 234:9: E741 ambiguous variable name 'l'
Line 240:9: E741 ambiguous variable name 'l'
Line 261:9: E741 ambiguous variable name 'l'

Comment last updated on December 04, 2017 at 13:41 Hours UTC

@WillAyd
Copy link
Member Author

WillAyd commented Dec 3, 2017

This was a fairly involved change as the tests were scattered all over the place. To aid in tracking purposes, the attached file lists out every function that was in the original test_categories.py file and where it was moved to. Please note that in some cases I changed the function name for clarity.

In addition to the parameterization that was added to the test_comparisons function I ended up refactoring three new functions into place, which you'll see on the second tab of the attached. Let me know what you think.
categorical_func_moves.xlsx

@jreback
Copy link
Contributor

jreback commented Dec 3, 2017

usually a quick check is to run the tests before and after and make sure we have the same number

@WillAyd
Copy link
Member Author

WillAyd commented Dec 3, 2017

When I compared the number of tests to upstream/master I got 4 more tests, which is what I expected (one from the parameterized test and the three new funcs)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. really some small comments. ping when green.

_max = cat.max()
assert _min == "a"
assert _max == "d"
cat = Categorical(["a", "b", "c", "d"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a blank line in between cases

_max = cat.max(numeric_only=True)
assert _max == "b"

cat = Categorical([np.nan, 1, 2, np.nan], categories=[5, 4, 3, 2, 1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. like this is good

res = s.mode()
exp = Categorical([5], categories=[5, 4, 3, 2, 1], ordered=True)
tm.assert_categorical_equal(res, exp)
s = Categorical([1, 1, 1, 4, 5, 5, 5], categories=[5, 4, 3, 2, 1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function repeated the same pattern of tests I went ahead and parametrized instead of changing the whitespace


class TestCategoricalAPI(object):

def test_searchsorted(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could move this to categorical/test_analytics (searchsorted)

exp = np.array([0, 1, 2, 0, 2], dtype='int8')
tm.assert_numpy_array_equal(c.codes, exp)

def test_unique(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to categorical/test_analytics (here and down thru the rest of this class)

out = cat.remove_unused_categories()
assert out.get_values().tolist() == val.tolist()

def test_codes_immutable(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function & recode can move to a separate class, maybe TestPrivateAPI (same file)

diff = cat.memory_usage(deep=True) - sys.getsizeof(cat)
assert abs(diff) < 100

def test_deprecated_labels(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't move this deprecated_labels its ok here) & test_deprecated_from_array

msg = "the 'axis' parameter is not supported"
tm.assert_raises_regex(ValueError, msg, np.repeat, cat, 2, axis=1)

def test_astype_categorical(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can move astype_categorical to the categorical/test_dtypes

@@ -1504,6 +1504,57 @@ def test_basic(self, left, right):
index=['X', 'Y', 'Z'])
assert_series_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this test_merge_categorical

@WillAyd
Copy link
Member Author

WillAyd commented Dec 4, 2017

Edits accounted for. Note that 5 new tests are showing up since the last commit, due to the parametrization of test_mode in test_analytics.py.

In case it is of use I updated the attachment with before/after locations of every func.

categorical_func_moves.xlsx

@WillAyd
Copy link
Member Author

WillAyd commented Dec 8, 2017

Checked the AppVeyor error but it is not coming from a module that was changed as part of this, so I believe it is unrelated. Let me know any outstanding thoughts on the change

@jreback jreback added this to the 0.22.0 milestone Dec 8, 2017
@jreback jreback merged commit 3395742 into pandas-dev:master Dec 8, 2017
@jreback
Copy link
Contributor

jreback commented Dec 8, 2017

thanks @WillAyd nice patch!

@WillAyd WillAyd deleted the split-cat-tests branch December 12, 2017 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: split test_categorical.py into sub test-files
3 participants