Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: dropna incorrect with categoricals in pivot_table #21252

Merged
merged 3 commits into from
Jun 7, 2018

Conversation

jreback
Copy link
Contributor

@jreback jreback commented May 29, 2018

closes #21133

@jreback jreback added this to the 0.23.1 milestone May 29, 2018
@jreback
Copy link
Contributor Author

jreback commented May 29, 2018

would appreciate a look @WillAyd @jschendel @TomAugspurger @jorisvandenbossche

@codecov
Copy link

codecov bot commented May 30, 2018

Codecov Report

Merging #21252 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21252      +/-   ##
==========================================
- Coverage   91.85%   91.85%   -0.01%     
==========================================
  Files         153      153              
  Lines       49564    49561       -3     
==========================================
- Hits        45527    45524       -3     
  Misses       4037     4037
Flag Coverage Δ
#multiple 90.25% <100%> (-0.01%) ⬇️
#single 41.87% <25%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/pivot.py 97.03% <100%> (+0.05%) ⬆️
pandas/io/formats/style.py 96.03% <0%> (-0.09%) ⬇️
pandas/core/reshape/melt.py 97.34% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c460710...683fd9e. Read the comment docs.

name='A'))

tm.assert_frame_equal(result, expected)

Copy link
Member

@jschendel jschendel May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add tests where columns/values and index/columns/values are specified for pivot_table? It looks like these fail with a similar setup on 0.23.0.

Using the same definition of df as you used in your test, columns/values is incorrect:

In [3]: pd.__version__
Out[3]: '0.23.0'

In [4]: df.pivot_table(columns='A', values='B')
Out[4]:
A  NaN  low
B  2.0  3.0

Similarly index/columns/values is incorrect:

In [5]: df['AA'] = df['A']

In [6]: df.pivot_table(index='A', columns='AA', values='B')
Out[6]:
AA   NaN  low
A
NaN  2.0  NaN
low  NaN  3.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jschendel I tested that those are working correctly with this PR, but given I wanted to get this in for the release I already merged. But it's indeed true it would be good to add those as additional test case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #21370 to keep track of this

@topper-123
Copy link
Contributor

I've tried this PR on #21151. This does not solve that issue, though the bugs in the two issues seem very similar.

@jreback
Copy link
Contributor Author

jreback commented May 31, 2018

updated, if you'd have another look

grouped = data.groupby(keys, observed=dropna)
# group by the cartesian product of the grouper
# if we have a categorical
grouped = data.groupby(keys, observed=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling this workaround would not be needed if the bug in groupby would be solved? (#21151)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
Copy link
Contributor Author

jreback commented Jun 6, 2018

this is complicated
i don’t think will get to this for 0.23.1

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 6, 2018

Then let's revert the changes to pivot for now? (not the addition of observed keyword to groupby! I mean only the internal use of it inside pivot)

@jreback
Copy link
Contributor Author

jreback commented Jun 7, 2018

I think its prob ok to merge this particular commit (even if susequently change for #21151) or leave this all for 0.23.2

@jorisvandenbossche
Copy link
Member

I think its prob ok to merge this particular commit

Ah, yes, that's actually basically the same as reverting the change to pivot.
Merging then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error with pivot_table and categorical data when add dropna args in version 0.23
5 participants