-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiindex scalar coords, fixes #1408 #1412
Conversation
if a single element is selected from a MultiIndex.
@@ -256,23 +258,6 @@ def _replace_maybe_drop_dims(self, variable, name=__default): | |||
if set(v.dims) <= allowed_dims) | |||
return self._replace(variable, coords, name) | |||
|
|||
def _replace_indexes(self, indexes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this method and use Dataset._replace_indexes
instead, to reduce duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember well, we used duplicates to avoid using _to_temp_dataset
and _from_temp_dataset
in __getitem__
. But now that _replace_indexes
has more logic implemented, maybe it is a good idea to reduce duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Yes, in DataArray.__getitem__
and also in DataAarray.isel
, _to_temp_dataset
and _from_temp_dataset
are now being used.
@@ -362,6 +362,22 @@ def _maybe_unstack(self, obj): | |||
del obj.coords[dim] | |||
return obj | |||
|
|||
def _maybe_stack(self, applied): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method becomes necessary, because now we cannot do xr.concat([ds.isel(yx=i)] for i in range(*)], dim='yx')
because ds.isel(yx=i)
does not have yx
anymore.
I don't like this change. It breaks an important invariant, which is that indexing a Variable returns another Variable. I do agree with indexing along a MultiIndex dimension should unpacking the tuple for coordinates, but only for coordinates. So this needs to be somewhere in the Consider indexing
We want to change the indexing behavior to this:
But we don't want to change what happens to the DataArray itself -- it should still be a scalar object array. I tested this example on your PR branch, and it actually crashes with |
…ement is selected from MultiIndex. Instead, added _maybe_split function in Dataset
@shoyer
I totally agree with you. In the last commit, I moved the unpacking functionality into Adding functions is easy but simplifying them are difficult... If anyone show a direction, I will try the improvement. |
A possible direction to reduce the This would simplify many things, although I haven't thought about about all other possible issues it would create (perfomance, etc.). Also, Here is another case related to this PR. From the example in the linked issue, the current behavior is In [9]: ds.isel(yx=[0, 1])
Out[9]:
<xarray.Dataset>
Dimensions: (yx: 2)
Coordinates:
* yx (yx) MultiIndex
- y (yx) object 'a' 'a'
- x (yx) int64 1 2
Data variables:
foo (yx) int64 1 2 Do we want to also change the behavior to this? In [10]: ds.isel(yx=[0, 1])
Out[10]:
<xarray.Dataset>
Dimensions: (x: 2)
Coordinates:
* x (x) int64 1 2
y object 'a'
Data variables:
foo (x) int64 1 2 To me it looks like it is a bit too magical, but just wondering what you think... |
Agreed, this also seems too magical to me. |
I also agree. It seems too magical. But I slightly changed my mind. The current structure is illustrated as follows, The I am wondering if we could have the following class structure things become simpler In this picture, MultiIndex can have I understand it is different from Any thoughts are welcome. |
@fujiisoup Yes, the solution of writing a MultiIndex wrapper for xarray looks much cleaner to me. I like the look of this proposal! (Those diagrams are also very helpful) I guess this could be implemented as a |
I also agree that a MultiIndex wrapper would be to be the way to go. I think I understand the diagrams, but what remains a bit unclear to me is how this could be implemented. In particular, how would this wrapper work with Currently, Would This is maybe slightly off topic, but more generally I'm also wondering how this would co-exist with potentially other kinds of multi-level indexes (see this comment). |
@benbovy Thanks. |
Replaced by a new PR #1426 . |
git diff upstream/master | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new APITo fix #1408,
This modification works, but actually I do not fully satisfied yet.
There are
if
statements in many places.The major changes I made are
variable.__getitem__
now returns an OrderedDict if a single element is selected from MultiIndex.indexing.remap_level_indexers
also returnsselected_dims
which is a map from the original dimension to the selected dims which will be a scalar coordinate.Change 1 keeps level-coordinates even after
ds.isel(yx=0)
.Change 2 enables to track which levels are selected, then the selected levels are changed to a scalar coordinate.
I guess much smarter solution should exist.
I would be happy if anyone gives me a comment.