Split test_categorical into subpackage (#18497) #18508

WillAyd · 2017-11-26T23:54:12Z

closes TST: split test_categorical.py into sub test-files #18497
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Speaking to the methodology here, I created separate files mirroring what was provided in the other test packages. Within each file, there is one class for standard categorical tests and one for "block" tests, owing back to how this file is currently structured.

Because there were setup functions for only a very limited number of test cases, rather than creating those as separate classes within each module I lumped all of them into one test_generic.py file.

jreback · 2017-11-27T00:06:02Z

you need to add an entry in setup.py FYI as we don't pick up the test sub-dirs automatically (maybe we should)

jreback

some comments

jreback · 2017-11-27T00:26:56Z

pandas/tests/categorical/test_generic.py

+                                   right=False, labels=cat_labels)
+        self.cat = df
+
+    def test_basic(self):


these should all move to series or frame / test_constructor

jreback · 2017-11-27T00:27:35Z

pandas/tests/categorical/test_generic.py

+    def test_groupby_sort(self):
+
+        # http://stackoverflow.com/questions/23814368/sorting-pandas-categorical-labels-after-groupby
+        # This should result in a properly sorted Series so that the plot


move to groupby

jreback · 2017-11-27T00:28:10Z

pandas/tests/categorical/test_generic.py

+
+        subf = self.factor[[0, 1, 2]]
+        tm.assert_numpy_array_equal(subf._codes,
+                                    np.array([0, 1, 1], dtype=np.int8))


move getitem / setitem to test_indexing

jreback · 2017-11-27T00:28:44Z

pandas/tests/categorical/test_operators.py

+class TestCategoricalBlockOps(object):
+
+    def test_comparisons(self):
+        tests_data = [(list("abc"), list("cba"), list("bbb")),


can you parametrize this

jreback · 2017-11-27T00:28:59Z

pandas/tests/categorical/test_operators.py

+            pytest.raises(TypeError, lambda: a < cat)
+            pytest.raises(TypeError, lambda: a < cat_rev)
+
+        # unequal comparison should raise for unordered cats


break from here in another test function

I'm thinking of splitting at line 105 into a method called test_unequal_comparison_raises_type_error and subsequently at line 133 into a function called test_nan_equality - do you think the latter should stay in this file or get moved to test_missing.py?

the rest could still be here, that's more of a comparison test

jreback · 2017-11-27T00:29:27Z

pandas/tests/categorical/test_generic.py

+        res = cat_rev > "b"
+        tm.assert_numpy_array_equal(res, exp)
+
+    def test_print(self):


move to test_repr

codecov · 2017-11-27T01:38:20Z

Codecov Report

Merging #18508 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18508      +/-   ##
==========================================
- Coverage   91.35%   91.31%   -0.05%     
==========================================
  Files         163      163              
  Lines       49801    49796       -5     
==========================================
- Hits        45496    45469      -27     
- Misses       4305     4327      +22

Flag	Coverage Δ
#multiple	`89.11% <ø> (-0.03%)`	⬇️
#single	`40.79% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.44% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️
pandas/tseries/offsets.py	`96.94% <0%> (-0.09%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 674fb96...574e6c3. Read the comment docs.

codecov · 2017-11-27T01:38:33Z

Codecov Report

Merging #18508 into master will decrease coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18508      +/-   ##
==========================================
- Coverage   91.59%   91.55%   -0.04%     
==========================================
  Files         153      155       +2     
  Lines       51257    51255       -2     
==========================================
- Hits        46949    46929      -20     
- Misses       4308     4326      +18

Flag	Coverage Δ
#multiple	`89.42% <ø> (-0.02%)`	⬇️
#single	`40.67% <ø> (-0.12%)`	⬇️

Impacted Files	Coverage Δ
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.44% <0%> (-3.08%)`	⬇️
pandas/core/config_init.py	`98.34% <0%> (-0.12%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️
pandas/plotting/_timeseries.py	`60.73% <0%> (-0.1%)`	⬇️
pandas/tseries/offsets.py	`96.9% <0%> (-0.05%)`	⬇️
pandas/core/indexes/datetimes.py	`95.68% <0%> (ø)`	⬆️
pandas/plotting/_compat.py	`62% <0%> (ø)`	⬆️
pandas/io/parquet.py	`65.38% <0%> (ø)`	⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9629fef...5e9234a. Read the comment docs.

jreback · 2017-11-27T01:43:40Z

pandas/tests/categorical/test_generic.py

+        pytest.raises(TypeError, lambda: cat_rev > a)
+
+        # The following work via '__array_priority__ = 1000'
+        # works only on numpy >= 1.7.1


this is always true now (you can remove the LooseVersion check), we are always on >= 1.9

jreback · 2017-11-27T01:44:21Z

pandas/tests/categorical/test_dtypes.py

+class TestCategoricalBlockDtypes(object):
+
+    def test_dtypes(self):
+


could move this to reshape/test_concat.py. generaly philosphy is to put the same types of testing together and not segregate by dtype.

jreback · 2017-11-27T01:45:33Z

pandas/tests/categorical/test_generic.py

+                        lambda x: x.astype('object').astype(Categorical)]:
+            pytest.raises(TypeError, lambda: invalid(s))
+
+    def test_numeric_like_ops(self):


can move to test_operators (below)

jreback · 2017-11-27T01:46:05Z

pandas/tests/categorical/test_groupby.py

+
+
+class TestCategoricalBlockGroubpy(TestCategoricalBlock):
+


oh, meant this to move to pandas/tests/groupby/test_categorical.py

jreback · 2017-11-27T01:50:02Z

pandas/tests/categorical/test_indexing.py

+        tm.assert_numpy_array_equal(result, np.array([5], dtype='int8'))
+
+    def test_set_categories(self):
+        cat = Categorical(["a", "b", "c", "a"], ordered=True)


set_categories testing goes in test_api

jreback · 2017-11-27T01:56:06Z

pandas/tests/categorical/test_api.py

+        df = DataFrame(Series(cat))
+
+    def test_categorical_frame(self):
+        # normal DataFrame


mvoe to tests/frame/test_constructor

jreback · 2017-11-27T01:56:19Z

pandas/tests/categorical/test_api.py

+                                 labels=labels)
+
+    def test_assignment_to_dataframe(self):
+        # assignment


move to tests/frame/test_indexing

jreback · 2017-11-27T01:57:17Z

pandas/tests/categorical/test_api.py

+        tm.assert_almost_equal(results, list(sorted(set(ok_for_cat))))
+
+    @pytest.mark.parametrize(
+        "dtype",


move the drop_duplicates to pandas/tests/series/tests_analytics

jreback · 2017-11-27T01:58:10Z

pandas/tests/categorical/test_api.py

+        tm.assert_index_equal(s.cat.categories, Index(["a"]))
+
+    def test_sequence_like(self):
+


move to pandas/tests/frame/test_api

jreback · 2017-11-27T01:58:45Z

pandas/tests/categorical/test_api.py

+class TestBlockCategoricalAPI(object):
+
+    def test_reshaping(self):
+


move to pandas/tests/reshape/test_reshape

WillAyd · 2017-11-30T22:52:42Z

Do you think it makes sense to move the block constructors into the series and frame construction tests as well or would you rather keep them here?

jreback · 2017-12-01T00:00:24Z

yes i would ideally like to live any testing to series/frame as appropriate
could add a series/test_categorical (and frame too)

we already segregate some testing by type somewhat (test_timestanp/timedelta) and not solely by function (in series/frame)

pep8speaks · 2017-12-03T22:17:33Z

Hello @WillAyd! Thanks for updating the PR.

In the file pandas/tests/frame/test_constructors.py, following are the PEP8 issues :

Line 1604:9: E741 ambiguous variable name 'l'
Line 1609:9: E741 ambiguous variable name 'l'

In the file pandas/tests/series/test_dtypes.py, following are the PEP8 issues :

Line 234:9: E741 ambiguous variable name 'l'
Line 240:9: E741 ambiguous variable name 'l'
Line 261:9: E741 ambiguous variable name 'l'

Comment last updated on December 04, 2017 at 13:41 Hours UTC

WillAyd · 2017-12-03T22:24:37Z

This was a fairly involved change as the tests were scattered all over the place. To aid in tracking purposes, the attached file lists out every function that was in the original test_categories.py file and where it was moved to. Please note that in some cases I changed the function name for clarity.

In addition to the parameterization that was added to the test_comparisons function I ended up refactoring three new functions into place, which you'll see on the second tab of the attached. Let me know what you think.
categorical_func_moves.xlsx

jreback · 2017-12-03T22:28:07Z

usually a quick check is to run the tests before and after and make sure we have the same number

WillAyd · 2017-12-03T22:30:04Z

When I compared the number of tests to upstream/master I got 4 more tests, which is what I expected (one from the parameterized test and the three new funcs)

jreback

looks great. really some small comments. ping when green.

jreback · 2017-12-03T22:34:07Z

pandas/tests/categorical/test_analytics.py

+        _max = cat.max()
+        assert _min == "a"
+        assert _max == "d"
+        cat = Categorical(["a", "b", "c", "d"],


can you put a blank line in between cases

jreback · 2017-12-03T22:34:17Z

pandas/tests/categorical/test_analytics.py

+        _max = cat.max(numeric_only=True)
+        assert _max == "b"
+
+        cat = Categorical([np.nan, 1, 2, np.nan], categories=[5, 4, 3, 2, 1],


e.g. like this is good

jreback · 2017-12-03T22:34:25Z

pandas/tests/categorical/test_analytics.py

+        res = s.mode()
+        exp = Categorical([5], categories=[5, 4, 3, 2, 1], ordered=True)
+        tm.assert_categorical_equal(res, exp)
+        s = Categorical([1, 1, 1, 4, 5, 5, 5], categories=[5, 4, 3, 2, 1],


Since this function repeated the same pattern of tests I went ahead and parametrized instead of changing the whitespace

jreback · 2017-12-03T22:35:21Z

pandas/tests/categorical/test_api.py

+
+class TestCategoricalAPI(object):
+
+    def test_searchsorted(self):


could move this to categorical/test_analytics (searchsorted)

jreback · 2017-12-03T22:36:28Z

pandas/tests/categorical/test_api.py

+        exp = np.array([0, 1, 2, 0, 2], dtype='int8')
+        tm.assert_numpy_array_equal(c.codes, exp)
+
+    def test_unique(self):


move to categorical/test_analytics (here and down thru the rest of this class)

jreback · 2017-12-03T22:37:17Z

pandas/tests/categorical/test_api.py

+        out = cat.remove_unused_categories()
+        assert out.get_values().tolist() == val.tolist()
+
+    def test_codes_immutable(self):


this function & recode can move to a separate class, maybe TestPrivateAPI (same file)

jreback · 2017-12-03T22:38:48Z

pandas/tests/categorical/test_api.py

+            diff = cat.memory_usage(deep=True) - sys.getsizeof(cat)
+            assert abs(diff) < 100
+
+    def test_deprecated_labels(self):


don't move this deprecated_labels its ok here) & test_deprecated_from_array

jreback · 2017-12-03T22:40:10Z

pandas/tests/categorical/test_api.py

+        msg = "the 'axis' parameter is not supported"
+        tm.assert_raises_regex(ValueError, msg, np.repeat, cat, 2, axis=1)
+
+    def test_astype_categorical(self):


can move astype_categorical to the categorical/test_dtypes

jreback · 2017-12-03T22:44:21Z

pandas/tests/reshape/test_merge.py

@@ -1504,6 +1504,57 @@ def test_basic(self, left, right):
                          index=['X', 'Y', 'Z'])
        assert_series_equal(result, expected)



call this test_merge_categorical

WillAyd · 2017-12-04T13:44:59Z

Edits accounted for. Note that 5 new tests are showing up since the last commit, due to the parametrization of test_mode in test_analytics.py.

In case it is of use I updated the attachment with before/after locations of every func.

categorical_func_moves.xlsx

WillAyd · 2017-12-08T00:20:09Z

Checked the AppVeyor error but it is not coming from a module that was changed as part of this, so I believe it is unrelated. Let me know any outstanding thoughts on the change

jreback · 2017-12-08T00:25:42Z

thanks @WillAyd nice patch!

WillAyd force-pushed the split-cat-tests branch from c8886dc to d262a8b Compare November 27, 2017 00:11

jreback added Categorical Categorical Data Type Testing pandas testing functions or related to the test suite labels Nov 27, 2017

jreback requested changes Nov 27, 2017

View reviewed changes

WillAyd force-pushed the split-cat-tests branch from 574e6c3 to b8bfcff Compare December 3, 2017 22:17

jreback requested changes Dec 3, 2017

View reviewed changes

WillAyd force-pushed the split-cat-tests branch from b8bfcff to e77f198 Compare December 4, 2017 13:41

WillAyd added 7 commits December 7, 2017 13:35

Split test_categorical into subpackage (pandas-dev#18497)

f8d59b8

First round of comments (will squash later)

323cde9

Reorganized categorical tests

80c829e

Added lost tests, renamed funcs

5e4d2d8

Final refactor of categorical tests

041b710

Parametrized test_mode, cleaned up other funcs

afe014b

Rebase to account for c3c04e2

5e9234a

WillAyd force-pushed the split-cat-tests branch from e77f198 to 5e9234a Compare December 7, 2017 18:47

jreback added this to the 0.22.0 milestone Dec 8, 2017

jreback approved these changes Dec 8, 2017

View reviewed changes

jreback merged commit 3395742 into pandas-dev:master Dec 8, 2017

WillAyd deleted the split-cat-tests branch December 12, 2017 15:41

		class TestCategoricalBlockDtypes(object):

		def test_dtypes(self):

		tm.assert_index_equal(s.cat.categories, Index(["a"]))

		def test_sequence_like(self):

		class TestBlockCategoricalAPI(object):

		def test_reshaping(self):


		class TestCategoricalAPI(object):

		def test_searchsorted(self):

		@@ -1504,6 +1504,57 @@ def test_basic(self, left, right):
		index=['X', 'Y', 'Z'])
		assert_series_equal(result, expected)

Split test_categorical into subpackage (#18497) #18508

Split test_categorical into subpackage (#18497) #18508

Conversation

WillAyd commented Nov 26, 2017

jreback commented Nov 27, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd Nov 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 27, 2017

Codecov Report

codecov bot commented Nov 27, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Nov 30, 2017

jreback commented Dec 1, 2017

pep8speaks commented Dec 3, 2017 • edited Loading

Comment last updated on December 04, 2017 at 13:41 Hours UTC

WillAyd commented Dec 3, 2017

jreback commented Dec 3, 2017

WillAyd commented Dec 3, 2017 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Dec 4, 2017

WillAyd commented Dec 8, 2017

jreback commented Dec 8, 2017

WillAyd Nov 27, 2017 •

edited

Loading

codecov bot commented Nov 27, 2017 •

edited

Loading

pep8speaks commented Dec 3, 2017 •

edited

Loading

WillAyd commented Dec 3, 2017 •

edited

Loading