BUG: dropna incorrect with categoricals in pivot_table #21252

jreback · 2018-05-29T23:50:34Z

jreback · 2018-05-29T23:51:05Z

would appreciate a look @WillAyd @jschendel @TomAugspurger @jorisvandenbossche

codecov · 2018-05-30T01:13:46Z

Codecov Report

Merging #21252 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21252      +/-   ##
==========================================
- Coverage   91.85%   91.85%   -0.01%     
==========================================
  Files         153      153              
  Lines       49564    49561       -3     
==========================================
- Hits        45527    45524       -3     
  Misses       4037     4037

Flag	Coverage Δ
#multiple	`90.25% <100%> (-0.01%)`	⬇️
#single	`41.87% <25%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/reshape/pivot.py	`97.03% <100%> (+0.05%)`	⬆️
pandas/io/formats/style.py	`96.03% <0%> (-0.09%)`	⬇️
pandas/core/reshape/melt.py	`97.34% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c460710...683fd9e. Read the comment docs.

jschendel · 2018-05-30T07:25:49Z

pandas/tests/reshape/test_pivot.py

+                name='A'))
+
+        tm.assert_frame_equal(result, expected)
+


Can you also add tests where columns/values and index/columns/values are specified for pivot_table? It looks like these fail with a similar setup on 0.23.0.

Using the same definition of df as you used in your test, columns/values is incorrect:

In [3]: pd.__version__ Out[3]: '0.23.0' In [4]: df.pivot_table(columns='A', values='B') Out[4]: A NaN low B 2.0 3.0

Similarly index/columns/values is incorrect:

In [5]: df['AA'] = df['A'] In [6]: df.pivot_table(index='A', columns='AA', values='B') Out[6]: AA NaN low A NaN 2.0 NaN low NaN 3.0

@jschendel I tested that those are working correctly with this PR, but given I wanted to get this in for the release I already merged. But it's indeed true it would be good to add those as additional test case

Opened #21370 to keep track of this

topper-123 · 2018-05-30T19:51:39Z

I've tried this PR on #21151. This does not solve that issue, though the bugs in the two issues seem very similar.

jreback · 2018-05-31T10:04:04Z

updated, if you'd have another look

jorisvandenbossche · 2018-05-31T10:56:59Z

pandas/core/reshape/pivot.py

-    grouped = data.groupby(keys, observed=dropna)
+    # group by the cartesian product of the grouper
+    # if we have a categorical
+    grouped = data.groupby(keys, observed=False)


I have the feeling this workaround would not be needed if the bug in groupby would be solved? (#21151)

jreback · 2018-06-06T13:51:49Z

this is complicated
i don’t think will get to this for 0.23.1

jorisvandenbossche · 2018-06-06T14:09:41Z

Then let's revert the changes to pivot for now? (not the addition of observed keyword to groupby! I mean only the internal use of it inside pivot)

jreback · 2018-06-07T10:57:23Z

I think its prob ok to merge this particular commit (even if susequently change for #21151) or leave this all for 0.23.2

closes pandas-dev#21133

jorisvandenbossche · 2018-06-07T22:05:49Z

I think its prob ok to merge this particular commit

Ah, yes, that's actually basically the same as reverting the change to pivot.
Merging then.

)

) (cherry picked from commit abfac97)

(cherry picked from commit abfac97)

)

jreback added Bug Groupby Categorical Categorical Data Type Needs Backport labels May 29, 2018

jreback added this to the 0.23.1 milestone May 29, 2018

jschendel reviewed May 30, 2018

View reviewed changes

jreback force-pushed the pivot branch from b89e9cf to 5c70f0d Compare May 31, 2018 10:02

jorisvandenbossche reviewed May 31, 2018

View reviewed changes

BUG: dropna incorrect with categoricals in pivot_table

985013b

closes pandas-dev#21133

jreback force-pushed the pivot branch from 5c70f0d to 985013b Compare June 7, 2018 10:59

jorisvandenbossche added 2 commits June 8, 2018 00:00

Merge branch 'master' into pivot

b4885e6

correct whatsnew message

683fd9e

jorisvandenbossche merged commit abfac97 into pandas-dev:master Jun 7, 2018

jorisvandenbossche mentioned this pull request Jun 7, 2018

TST: add additional test cases for pivot_table with categorical data #21370

Closed

This was referenced Jun 8, 2018

BUG: Incorrect values shown by pivot_table() #21378

Closed

TST : Adding new test case for pivot_table() with Categorical data #21381

Closed

daminisatya pushed a commit to daminisatya/pandas that referenced this pull request Jun 8, 2018

BUG: dropna incorrect with categoricals in pivot_table (pandas-dev#21252

f3f7eb9

)

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 12, 2018

BUG: dropna incorrect with categoricals in pivot_table (pandas-dev#21252

c5850c1

) (cherry picked from commit abfac97)

TomAugspurger pushed a commit that referenced this pull request Jun 12, 2018

BUG: dropna incorrect with categoricals in pivot_table (#21252)

c2f2159

(cherry picked from commit abfac97)

TomAugspurger removed the Needs Backport label Jun 12, 2018

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

BUG: dropna incorrect with categoricals in pivot_table (pandas-dev#21252

398a963

)

jschendel mentioned this pull request Jun 21, 2018

crosstab gives wrong result if a categorical Series contains NaNs #21565

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: dropna incorrect with categoricals in pivot_table #21252

BUG: dropna incorrect with categoricals in pivot_table #21252

jreback commented May 29, 2018

jreback commented May 29, 2018

codecov bot commented May 30, 2018 •

edited

Loading

jschendel May 30, 2018 •

edited

Loading

jorisvandenbossche Jun 7, 2018

jorisvandenbossche Jun 7, 2018

topper-123 commented May 30, 2018

jreback commented May 31, 2018

jorisvandenbossche May 31, 2018

jorisvandenbossche Jun 6, 2018

jreback commented Jun 6, 2018

jorisvandenbossche commented Jun 6, 2018 •

edited

Loading

jreback commented Jun 7, 2018

jorisvandenbossche commented Jun 7, 2018

BUG: dropna incorrect with categoricals in pivot_table #21252

BUG: dropna incorrect with categoricals in pivot_table #21252

Conversation

jreback commented May 29, 2018

jreback commented May 29, 2018

codecov bot commented May 30, 2018 • edited Loading

Codecov Report

jschendel May 30, 2018 • edited Loading

Choose a reason for hiding this comment

jorisvandenbossche Jun 7, 2018

Choose a reason for hiding this comment

jorisvandenbossche Jun 7, 2018

Choose a reason for hiding this comment

topper-123 commented May 30, 2018

jreback commented May 31, 2018

jorisvandenbossche May 31, 2018

Choose a reason for hiding this comment

jorisvandenbossche Jun 6, 2018

Choose a reason for hiding this comment

jreback commented Jun 6, 2018

jorisvandenbossche commented Jun 6, 2018 • edited Loading

jreback commented Jun 7, 2018

jorisvandenbossche commented Jun 7, 2018

codecov bot commented May 30, 2018 •

edited

Loading

jschendel May 30, 2018 •

edited

Loading

jorisvandenbossche commented Jun 6, 2018 •

edited

Loading