-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for integer array indexing #900
Conversation
cc @shoyer |
Co-authored-by: Evgeni Burovski <evgeny.burovskiy@gmail.com>
Is there a reason for the spec to be limited to 1D arrays/tuple of 1D arrays? ie I do this all the time:
which prints:
Do any of the above libraries fail this test? |
One other thing I don't immediately see in this addition is whether this relates go |
I would really like to see this restriction relaxed, which significantly limits the utility of this feature. Dask could absolutely support this via existing functionality (e.g., broadcast_arrays and ravel before indexing, followed by reshape). Not every indexing operation can be supported efficiently in the fully distributed case, but it is far better to support indexing with multi-dimensional arrays in the standard with compatibility code in dask, than to force users to do indexing with flattened 1D arrays, which can have severe negative performance implications and makes it even harder to write efficient distributed code. |
Is the only reason Dask doesn't have it that no one has done the work? You still have a Dask commit bit, right? Any sense of whether a PR there will meet resistance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore the "I feel it is a bit verbose", obviously. Mostly saw the same things Stephan did, I think a few points should be highlighted more, because it is easy to miss the interesting bits if you know indexing well enough.
Notes/Questions:
- Do we need to spell out whether tuple subclasses should be supported or are undefined?
- Should we note the fact that if the result is a copy or not depends on library specific rules? (if a library supports view semantics.)
- I think it makes sense to prescribe an
IndexError
on non-integer arrays. (Again, de-factor prescribed be early text probably.)
Efficient fully general indexing doesn't fit easily into Dask's data model, because Dask needs to understand the structure of the distributed computation without evaluating data. So if you index a dask array by another chunked dask array, there is no way to avoid the need to do all-to-all transfer between chunks, basically duplicating the data on every distributed worker. That said, the same thing is true for I do probably still have a commit bit for Dask, but it's been many years since I contributed anything to the project. |
We discussed this PR during the workgroup meeting on Feb 20, 2025, and arrived at the following consensus:
Based on the above, I've updated the text accordingly. The main updates were items 6-7. Otherwise, the text was given thumbs of approval during the workgroup meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo optional nits.
Co-authored-by: Evgeni Burovski <evgeny.burovskiy@gmail.com>
As we discussed this PR during the workgroup meeting and no further comments have been made, I'll go ahead and merge. Any further adjustments can be made as errata to the 2024 specification revision. Thanks all! |
This PR:
Notes
References