-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: dropna incorrect with categoricals in pivot_table #21252
Conversation
would appreciate a look @WillAyd @jschendel @TomAugspurger @jorisvandenbossche |
Codecov Report
@@ Coverage Diff @@
## master #21252 +/- ##
==========================================
- Coverage 91.85% 91.85% -0.01%
==========================================
Files 153 153
Lines 49564 49561 -3
==========================================
- Hits 45527 45524 -3
Misses 4037 4037
Continue to review full report at Codecov.
|
name='A')) | ||
|
||
tm.assert_frame_equal(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add tests where columns
/values
and index
/columns
/values
are specified for pivot_table
? It looks like these fail with a similar setup on 0.23.0.
Using the same definition of df
as you used in your test, columns
/values
is incorrect:
In [3]: pd.__version__
Out[3]: '0.23.0'
In [4]: df.pivot_table(columns='A', values='B')
Out[4]:
A NaN low
B 2.0 3.0
Similarly index
/columns
/values
is incorrect:
In [5]: df['AA'] = df['A']
In [6]: df.pivot_table(index='A', columns='AA', values='B')
Out[6]:
AA NaN low
A
NaN 2.0 NaN
low NaN 3.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jschendel I tested that those are working correctly with this PR, but given I wanted to get this in for the release I already merged. But it's indeed true it would be good to add those as additional test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #21370 to keep track of this
I've tried this PR on #21151. This does not solve that issue, though the bugs in the two issues seem very similar. |
updated, if you'd have another look |
grouped = data.groupby(keys, observed=dropna) | ||
# group by the cartesian product of the grouper | ||
# if we have a categorical | ||
grouped = data.groupby(keys, observed=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling this workaround would not be needed if the bug in groupby would be solved? (#21151)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback ?
this is complicated |
Then let's revert the changes to |
I think its prob ok to merge this particular commit (even if susequently change for #21151) or leave this all for 0.23.2 |
Ah, yes, that's actually basically the same as reverting the change to pivot. |
) (cherry picked from commit abfac97)
(cherry picked from commit abfac97)
closes #21133