-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: avoid unnecessary casting when unstacking index with unused levels #18460
Conversation
7dd32be
to
99a5dce
Compare
Codecov Report
@@ Coverage Diff @@
## master #18460 +/- ##
=========================================
Coverage ? 91.55%
=========================================
Files ? 147
Lines ? 48827
Branches ? 0
=========================================
Hits ? 44703
Misses ? 4124
Partials ? 0
Continue to review full report at Codecov.
|
related to #17886 or is that separate? |
new_names = [self.value_columns.name, self.removed_name] | ||
new_labels = [propagator] | ||
|
||
new_labels.append(np.tile(np.arange(stride) - self.lift, width)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment on what is going on here (e.g. the unsused bizness)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see below)
exp_data[idces] = data | ||
cols = pd.MultiIndex.from_product([[0, 1], col_level]) | ||
expected = pd.DataFrame(exp_data.reshape(3, 6), | ||
index=idx_level, columns=cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have an exact expected frame and assert_frame_equal (maybe more code, but it really locks it down to the exact result).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frame is an exact copy, but assert_frame_equal
fails (two lines below) because of #18455 . So until that is fixed, I guess I can only add a check on the dtypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So until that is fixed, I guess I can only add a check on the dtypes.
(shall I?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think worthwhile to fix #18455 first actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
99a5dce
to
26a9968
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiirc this was pretty good. pls rebase and let's see where this is.
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -318,6 +318,7 @@ Sparse | |||
Reshaping | |||
^^^^^^^^^ | |||
|
|||
- Bug in :func:`DataFrame.unstack` which casts int to float if ``columns`` is a ``MultiIndex`` with unused levels (:issue:`17845`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.23
26a9968
to
2dc6cab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. style comments.
pandas/core/reshape/reshape.py
Outdated
new_names = [self.value_columns.name, self.removed_name] | ||
new_labels = [propagator] | ||
|
||
new_labels.append(np.tile(np.arange(stride) - self.lift, width)) | ||
# The two indices differ iff the unstacked level had unused items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iff -> if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if and only if - but OK)
pandas/core/reshape/reshape.py
Outdated
else: | ||
# Otherwise, we just use each level item exactly once: | ||
repeater = np.arange(stride) - self.lift | ||
# The entire level is then just a repetition of the single chunk: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line here
pandas/tests/frame/test_reshape.py
Outdated
@@ -560,6 +560,73 @@ def test_unstack_dtypes(self): | |||
assert left.shape == (3, 2) | |||
tm.assert_frame_equal(left, right) | |||
|
|||
def test_unstack_unused_levels(self): | |||
# GH 17845: sliced columns of int DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give 1 more line of expl here
pandas/tests/frame/test_reshape.py
Outdated
result = df.unstack() | ||
expected = pd.DataFrame(np.concatenate([block * 2, block * 2 - 1], | ||
axis=1), | ||
columns=idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are not comparing here?
pandas/tests/frame/test_reshape.py
Outdated
expected.columns = MultiIndex.from_product([expected.columns, ['I']], | ||
names=[None, 'C']) | ||
expected.index = expected.index.droplevel('C') | ||
assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tm.assert_frame_equal for consistentcy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the following test uses assert_frame_equal
, but if that's a leftover, OK
|
||
@pytest.mark.parametrize("cols", [['A', 'C'], slice(None)]) | ||
def test_unstack_unused_level(self, cols): | ||
# GH 18562 : unused labels on the unstacked level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you had 2 cases for #18562 does this cover both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, they are the two "cols" values
2dc6cab
to
a629b82
Compare
@jreback ping |
thanks @toobaz |
FYI, this caused a 25-30% slowdown in this ASV: http://pandas.pydata.org/speed/pandas/#reshape.SparseIndex.time_unstack |
I guess it's the |
can u create an issue to track this |
closes #17845
closes #18562
git diff upstream/master -u -- "*.py" | flake8 --diff