Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GH17464 MultiIndex now raises an error when levels aren't unique, tests changed #17971

Merged
merged 46 commits into from
Dec 3, 2017

Conversation

cmazzullo
Copy link
Contributor

@codecov
Copy link

codecov bot commented Oct 25, 2017

Codecov Report

Merging #17971 into master will decrease coverage by 0.01%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17971      +/-   ##
==========================================
- Coverage   91.23%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50113    50115       +2     
==========================================
- Hits        45723    45715       -8     
- Misses       4390     4400      +10
Flag Coverage Δ
#multiple 89.03% <50%> (-0.01%) ⬇️
#single 40.31% <50%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.32% <50%> (-0.08%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1dabf3...f957e51. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 25, 2017

Codecov Report

Merging #17971 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17971      +/-   ##
==========================================
+ Coverage   91.44%   91.45%   +<.01%     
==========================================
  Files         157      157              
  Lines       51378    51441      +63     
==========================================
+ Hits        46985    47044      +59     
- Misses       4393     4397       +4
Flag Coverage Δ
#multiple 89.32% <100%> (+0.02%) ⬆️
#single 40.6% <50%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.28% <100%> (-0.16%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/indexes/interval.py 93.05% <0%> (-0.17%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/core/indexes/base.py 96.45% <0%> (-0.03%) ⬇️
pandas/core/indexes/datetimes.py 95.69% <0%> (-0.03%) ⬇️
pandas/core/groupby.py 92.04% <0%> (ø) ⬆️
pandas/core/internals.py 94.47% <0%> (ø) ⬆️
pandas/core/series.py 94.8% <0%> (+0.04%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1eedcf6...703ff1e. Read the comment docs.

- Bug in :func:`Series.rename` when called with a callable, incorrectly alters the name of the ``Series``, rather than the name of the ``Index``. (:issue:`17407`)
- Bug in :func:`String.str_get` raises ``IndexError`` instead of inserting NaNs when using a negative index. (:issue:`17704`)
- Bug in :func:`Series.rename` when called with a `callable`, incorrectly alters the name of the `Series`, rather than the name of the `Index`. (:issue:`17407`)
- Bug in :func:`String.str_get` raises `index out of range` error instead of inserting NaNs when using a negative index. (:issue:`17704`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you changed these two entries? The previous versions look correct to me (double backticks instead of single, "callable" unticked, etc.).

- Bug in :func:`String.str_get` raises ``IndexError`` instead of inserting NaNs when using a negative index. (:issue:`17704`)
- Bug in :func:`Series.rename` when called with a `callable`, incorrectly alters the name of the `Series`, rather than the name of the `Index`. (:issue:`17407`)
- Bug in :func:`String.str_get` raises `index out of range` error instead of inserting NaNs when using a negative index. (:issue:`17704`)
- When created with duplicate labels, ``MultiIndex`` now raises a `ValueError`. (:issue:`17464`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError should have two backticks: ``ValueError``

@@ -1573,7 +1573,8 @@ def test_is_(self):
# shouldn't change
assert mi2.is_(mi)
mi4 = mi3.view()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an explicit test for the issue, e.g that you are raising (use the example from the original issue).

@@ -1573,7 +1573,8 @@ def test_is_(self):
# shouldn't change
assert mi2.is_(mi)
mi4 = mi3.view()
mi4.set_levels([[1 for _ in range(10)], lrange(10)], inplace=True)
# GH 17464 - Remove duplicate MultiIndex levels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need these comments here, rather they go on a new test (see above)

ind.set_levels([['A', 'B', 'A', 'A', 'B'], [2, 1, 3, -2, 5]],
inplace=True)

# GH 17464 - Remove duplicate MultiIndex levels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok changing this, but remove the comment, and add the original as another example of where things should raise (in the test you are adding above)

@@ -978,6 +978,7 @@ Indexing
- Bug in :meth:`DataFrame.first_valid_index` and :meth:`DataFrame.last_valid_index` when no valid entry (:issue:`17400`)
- Bug in :func:`Series.rename` when called with a callable, incorrectly alters the name of the ``Series``, rather than the name of the ``Index``. (:issue:`17407`)
- Bug in :func:`String.str_get` raises ``IndexError`` instead of inserting NaNs when using a negative index. (:issue:`17704`)
- When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.22.0 breaking changes

@cmazzullo
Copy link
Contributor Author

Sure thing

@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

looks like you need to rebase against master, seems you have pulled in other commits

git pull --rebase origin/master -i

then force push

@cmazzullo
Copy link
Contributor Author

My mistake, latest push should clear that up

@jreback
Copy link
Contributor

jreback commented Oct 28, 2017

I rebased you. I still need to look at this for review.

@jreback jreback added this to the 0.22.0 milestone Oct 29, 2017
@jreback
Copy link
Contributor

jreback commented Oct 29, 2017

can you run the asv for multi and see if any perf changes with this? (prob no, but just checking)

@cmazzullo
Copy link
Contributor Author

asv shows nasty perf changes though I think it may be a problem with my setup. I'll see if I can work it out. Here's the output for good measure.

       before           after         ratio
     [4af9a8b6]       [cdea7fec]
+     16.9±0.03ms      40.2±0.02ms     2.38  categoricals.Categoricals.time_constructor_all_nan
+      23.7±0.2ms         27.4±2ms     1.16  frame_methods.Dropna.time_dropna_axis0_all
+     9.42±0.05μs       10.9±0.2μs     1.15  indexing.Int64Indexing.time_iloc_scalar
+      38.3±0.1ms       43.5±0.8ms     1.14  stat_ops.stats_rolling_mean.time_rolling_median
+         724±5ns          821±8ns     1.13  period.Properties.time_year
+      43.0±0.2ms       48.8±0.2ms     1.13  timedelta.ToTimedelta.time_convert_coerce
+       427±0.4μs          474±3μs     1.11  groupby.GroupBySuite.time_std('int', 100)
+        1.61±0ms      1.77±0.02ms     1.10  strings.StringMethods.time_contains_many_noregex
-         856±8ns          747±4ns     0.87  period.Properties.time_is_leap_year
-        45.9±1ms       38.3±0.2ms     0.83  gil.nogil_take1d_int64.time_nogil_take1d_int64

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

can you rebase

@cmazzullo
Copy link
Contributor Author

Rebased

@cmazzullo
Copy link
Contributor Author

cmazzullo commented Nov 29, 2017

Looks like some tests from test_groupby were duplicated in test_functional, without the modifications. I updated the duplicated function in test_functional to get rid of the error - let me know if you think it would be better to remove all the duped tests instead.

@@ -81,6 +81,9 @@ Other API Changes
- Inserting missing values into indexes will work for all types of indexes and automatically insert the correct type of missing value (``NaN``, ``NaT``, etc.) regardless of the type passed in (:issue:`18295`)
- Restricted ``DateOffset`` keyword arguments. Previously, ``DateOffset`` subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:`17176`, :issue:`18226`).
- :func:`DataFrame.from_items` provides a more informative error message when passed scalar values (:issue:`17312`)
- :class:`Timestamp` will no longer silently ignore unused or invalid `tz` or `tzinfo` arguments (:issue:`17690`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like some duplication here

@@ -198,6 +198,10 @@ def _verify_integrity(self, labels=None, levels=None):
" level (%d). NOTE: this index is in an"
" inconsistent state" % (i, label.max(),
len(level)))
if not level.is_unique:
raise ValueError("Level values must be unique: {0}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use kwargs rather than positional here

@@ -67,8 +67,10 @@ def test_frame_describe_multikey(self):
'C': 1, 'D': 1}, axis=1)
result = groupedT.describe()
expected = self.tsframe.describe().T
expected.index = pd.MultiIndex([[0, 0, 1, 1], expected.index],
[range(4), range(len(expected.index))])
# GH 17464 - Remove duplicate MultiIndex levels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this comment here

@@ -385,6 +385,83 @@ def test_attr_wrapper(self):
# make sure raises error
pytest.raises(AttributeError, getattr, grouped, 'foo')

def test_series_describe_multikey(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests were moved, revert this file

# Make sure that a MultiIndex with duplicate levels throws a ValueError
with pytest.raises(ValueError):
ind = pd.MultiIndex([['A'] * 10, range(10)], [[0] * 10, range(10)])
# And that using set_levels with duplicate levels fails
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line

@cmazzullo
Copy link
Contributor Author

@jreback Okay, addressed those issues

@jreback jreback merged commit 8172565 into pandas-dev:master Dec 3, 2017
@jreback
Copy link
Contributor

jreback commented Dec 3, 2017

thanks @cmazzullo

@cmazzullo cmazzullo deleted the gh-17464 branch December 4, 2017 03:27
@cmazzullo
Copy link
Contributor Author

No problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERR: invalid MultiIndex construction
3 participants