Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: "ValueError: cannot unstack dimensions that do not have a MultiIndex" when unstacking a MultiIndex #5384

Closed
dranjan opened this issue May 27, 2021 · 5 comments · Fixed by #5385

Comments

@dranjan
Copy link
Contributor

dranjan commented May 27, 2021

I'm not sure if this is a bug or I'm not using xarray correctly, but I used to be able to do this without crashing. The new behavior seems to have been introduced some time between 0.16.2 and 0.18.2.

What happened:

Traceback (most recent call last):
  File "scripts/repro.py", line 12, in <module>
    ds = ds.unstack(['c'])
  File "/home/darsh/src/notebooks/build/venv/lib/python3.8/site-packages/xarray/core/dataset.py", line 4024, in unstack
    raise ValueError(
ValueError: cannot unstack dimensions that do not have a MultiIndex: ['c']

What you expected to happen:

The code runs without the ValueError exception.

Minimal Complete Verifiable Example:

from xarray import DataArray, Dataset


a = DataArray([0], dims=['a'])
b = a.stack(b=('a',)).reset_index('b')
c = b.stack({'c': ['b']})

ds = Dataset({'d': DataArray(c.data, dims=['c'])}, coords=c.coords)
print('\nBefore:')
print(ds)

ds = ds.unstack(['c'])
print('\nAfter:')
print(ds)

Anything else we need to know?:

Here's the full output from the example on 0.18.2:


Before:
<xarray.Dataset>
Dimensions:  (c: 1)
Coordinates:
  * c        (c) MultiIndex
  - b        (c) int64 0
    a        (c) int64 0
Data variables:
    d        (c) int64 0
Traceback (most recent call last):
  File "scripts/repro.py", line 12, in <module>
    ds = ds.unstack(['c'])
  File "/home/darsh/src/notebooks/build/venv/lib/python3.8/site-packages/xarray/core/dataset.py", line 4024, in unstack
    raise ValueError(
ValueError: cannot unstack dimensions that do not have a MultiIndex: ['c']

What confuses me is that the c dimension is shown as a MultiIndex, but it still complains that it doesn't have a MultiIndex. Directly unstacking ds.d rather than the dataset itself also fails with the same exception.

Oddly, it seems to work if I assign the coordinates after constructing the dataset:

diff --git a/scripts/repro.py b/scripts/repro.py
index ed2ae7c..d5bd6a3 100644
--- a/scripts/repro.py
+++ b/scripts/repro.py
@@ -5,7 +5,7 @@ a = DataArray([0], dims=['a'])
 b = a.stack(b=('a',)).reset_index('b')
 c = b.stack({'c': ['b']})
 
-ds = Dataset({'d': DataArray(c.data, dims=['c'])}, coords=c.coords)
+ds = Dataset({'d': DataArray(c.data, dims=['c'])}).assign_coords(c.coords)
 print('\nBefore:')
 print(ds)
 

With that workaround, or by downgrading to 0.16.2, the example doesn't crash:


Before:
<xarray.Dataset>
Dimensions:  (c: 1)
Coordinates:
  * c        (c) MultiIndex
  - b        (c) int64 0
    a        (c) int64 0
Data variables:
    d        (c) int64 0

After:
<xarray.Dataset>
Dimensions:  (b: 1)
Coordinates:
    a        (b) int64 0
  * b        (b) int64 0
Data variables:
    d        (b) int64 0

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.0 (default, Feb 25 2021, 22:10:10)
[GCC 8.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-73-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 0.18.2
pandas: 1.2.4
numpy: 1.20.3
scipy: 1.6.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.05.0
distributed: None
matplotlib: 3.4.2
cartopy: None
seaborn: None
numbagg: None
pint: 0.17
setuptools: 39.0.1
pip: 21.1.1
conda: None
pytest: 6.2.4
IPython: 7.23.1
sphinx: None
None

@max-sixty
Copy link
Collaborator

max-sixty commented May 27, 2021

This does look like a bug, specifically affecting MultiIndexes containing only one Index.

The issue seems to be that self.get_index('c') returns a normal index:

   4020             non_multi_dims = [
   4021                 d for d in dims if not isinstance(self.get_index(d), pd.MultiIndex)
   4022             ]
   4023             if non_multi_dims:
-> 4024                 raise ValueError(
   4025                     "cannot unstack dimensions that do not "
   4026                     f"have a MultiIndex: {non_multi_dims}"
   4027                 )
   4028
   4029         result = self.copy(deep=False)
   4030         for dim in dims:

ipdb> self.get_index('c')
Index([(0,)], dtype='object')   # <- single index

ipdb> self
<xarray.Dataset>
Dimensions:  (c: 1)
Coordinates:
  * c        (c) MultiIndex   # <- multi index
  - b        (c) int64 0
    a        (c) int64 0
Data variables:
    d        (c) int64 0

I'm not sure how common it is for MultiIndexes to have a single index, but we should be general over any number.

We'd definitely take a fix for this.

@dranjan
Copy link
Contributor Author

dranjan commented May 27, 2021

I'm not sure how common it is for MultiIndexes to have a single index, but we should be general over any number.

That makes sense, and it actually pretty much sums up how I encountered this. My code here is a reduction of a function I wrote that was supposed to work with a fairly general array and subset of its dimensions, and I happened to call it with a one-element dimension list.

We'd definitely take a fix for this.

In principle, I'd be happy to help, but I haven't gone into the xarray internals at all yet, nor do I have the dev environment set up (in particular, I don't have conda), so it would probably take me a while.

@benbovy
Copy link
Member

benbovy commented May 27, 2021

This has been introduced in #5102. I'm looking into it.

@benbovy
Copy link
Member

benbovy commented May 27, 2021

This should be fixed in #5385.

Side note: the example you gave here, i.e.,

ds = Dataset({'d': DataArray(c.data, dims=['c'])}, coords=c.coords)
ds = ds.unstack(['c'])

should probably be depreciated after the index refactoring in Xarray (currently WIP), which aims to decouple the concepts of coordinates vs. indexes. More specifically, indexes shouldn't be passed implicitly via the coords argument of the Dataset/DataArray constructors anymore, but instead explicitly via the indexes argument (see https://github.com/pydata/xarray/blob/master/design_notes/flexible_indexes_notes.md#22-explicit-vs-implicit-index-creation).

@dranjan
Copy link
Contributor Author

dranjan commented May 27, 2021

Interesting! Thanks for the heads-up, @benbovy. I'll keep my eye on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants