Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing with lists in multiple axes #433

Closed
jhamman opened this issue Jul 17, 2015 · 6 comments
Closed

Slicing with lists in multiple axes #433

jhamman opened this issue Jul 17, 2015 · 6 comments

Comments

@jhamman
Copy link
Member

jhamman commented Jul 17, 2015

From the From the dask docs:

Dask.array supports most of the NumPy slicing syntax.
...
It does not currently support the following:

Slicing one dask.array with another x[x > 0]
Slicing with lists in multiple axes x[[1, 2, 3], [3, 2, 1]]

Both of these are straightforward to add though. If you have a use case then raise an issue.

Here's that issue.

My use case is for point-wise indexing in xray: pydata/xarray#475

A simple use case using dask arrays:

x = da.ones((10, 100), chunks=(10, 10))
points = x[[1, 2, 3], [3, 2, 1]]

currently raises this error:

NotImplementedError: Don't yet support nd fancy indexing

cc @shoyer

@mrocklin
Copy link
Member

Am I correct in assuming that in this case you'd like the shape of the output to be (3,) and not (3, 3)?

@mrocklin
Copy link
Member

And if so is it important to you that this be the default array slicing syntax or are you ok with some other custom method?

@jhamman
Copy link
Member Author

jhamman commented Jul 17, 2015

Correct. I'm looking for the default numpy slicing behavior, how we get there is up for discussion. My preference would be to use the numpy slicing syntax, but could use a take method of some kind if that is substantially easier.

@mrocklin
Copy link
Member

They're both the same level of difficulty to accomplish. I mostly want to avoid locking dask into one behavior or the other for as long as possible.

@shoyer
Copy link
Member

shoyer commented Jul 17, 2015

NumPy has been discussing adding vindex and oindex attributes for explicit vectorized vs. outer/orthogonal indexing (see @seberg's NEP and PR).

I suggest that dask should follow NumPy's lead here, and consider implementing both attributes -- even if it will only support a limited subset of vectorized indexing. Outer indexing is easier to reason about and optimize (dask already has all the necessary functionality), and it will be useful to have explicit syntax both to support better fusing of getitem calls and to make things simpler for downstream libraries like xray (which uses outer indexing in __getitem__).

Dask certainly should not try to exactly replicate NumPy's current indexing behavior, which sometimes but not always reorders axes with array indices, e.g.,

In [14]: x = np.zeros((5, 6, 7, 8))

In [15]: x[:, [0, 1], [0, 1]].shape
Out[15]: (5, 2, 8)

In [16]: x[:, [0, 1], :, [0, 1]].shape
Out[16]: (2, 5, 7)

mrocklin added a commit to mrocklin/dask that referenced this issue Jul 17, 2015
This is equivalent to numpy slicing with multiple input lists.

We could use a better name.  cc @shoyer @jhamman

Example
-------

>>> x = np.arange(56).reshape((7, 8))
>>> x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55]])

>>> d = from_array(x, chunks=(3, 4))
>>> result = isel(d, [0, 1, 6, 0], [0, 1, 0, 7])
>>> result.compute()
array([ 0,  9, 48,  7])

Fixes dask#433
@mrocklin
Copy link
Member

OK, I've implemented this (I think) in #439 . It could use a better name. Happy to use vindex if that's best. Feedback welcome.

mrocklin added a commit to mrocklin/dask that referenced this issue Jul 28, 2015
This is equivalent to numpy slicing with multiple input lists.

We could use a better name.  cc @shoyer @jhamman

Example
-------

>>> x = np.arange(56).reshape((7, 8))
>>> x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55]])

>>> d = from_array(x, chunks=(3, 4))
>>> result = isel(d, [0, 1, 6, 0], [0, 1, 0, 7])
>>> result.compute()
array([ 0,  9, 48,  7])

Fixes dask#433
phofl added a commit to phofl/dask that referenced this issue Dec 23, 2024
Co-authored-by: crusaderky <crusaderky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants