Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v3] First step to generalizes ndarray and bytes #1826

Merged
merged 53 commits into from
May 16, 2024

Conversation

madsbk
Copy link
Contributor

@madsbk madsbk commented May 1, 2024

This PR introduces two new classes NDBuffer and Buffer to represent the data argument between components.

Currently, we use numpy.ndarray and bytes to pass around data between components. As discussed in #1751, it would be good with a generalization of the data containers to enable other memory types.

As a first step, NDBuffer and Buffer are backed by numpy arrays. In follow-up PRs, we can discuss/implement alternative backends such as cupy arrays or pytorch tensors. The important point here is to agree on an abstraction that can facilitate a broad range of array/data types.

cc. @akshaysubr, @jakirkham

@madsbk madsbk marked this pull request as ready for review May 6, 2024 09:01
@jhamman jhamman added the V3 label May 7, 2024
src/zarr/buffer.py Outdated Show resolved Hide resolved
src/zarr/buffer.py Outdated Show resolved Hide resolved
src/zarr/buffer.py Outdated Show resolved Hide resolved
@normanrz normanrz mentioned this pull request May 14, 2024
6 tasks
@madsbk
Copy link
Contributor Author

madsbk commented May 15, 2024

This PR is ready for a new round of reviews. As @normanrz suggested, I have removed the inheritance between the two buffer classes Buffer and NDBuffer.

AsyncArray.getitem() and AsyncArray.setitem() now takes a factory function to create a buffer from a ndarray-like object. We still need a mechanism to tell codecs and stores what type of buffer they should return but let’s do that in a follow-up PR.

@madsbk madsbk marked this pull request as ready for review May 15, 2024 09:36
@madsbk madsbk requested review from normanrz, akshaysubr and d-v-b May 15, 2024 09:36
src/zarr/buffer.py Outdated Show resolved Hide resolved
src/zarr/buffer.py Outdated Show resolved Hide resolved
"""
return cls.from_ndarray_like(np.frombuffer(bytes_like, dtype="b"))

def as_ndarray_like(self) -> NDArrayLike:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have an as_memoryview method that could be used for passing to Blosc, Gzip etc.? Would that also work without copying?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An as_memoryview would work like as_numpy_array. It might involve copying data depending on what the underlying ndarray-like object the buffer represent. E.g., if self._data is a cupy array, we have to copy from device to host memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found the method name as_ndarray_like confusing because it actually returns a wrapped 1d byte array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, Buffer is now backed by ArrayLike: 3854bec

For now, both ArrayLike and NDArrayLike are alias of np.ndarray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

Co-authored-by: Norman Rzepka <code@normanrz.com>
@normanrz normanrz merged commit ceee364 into zarr-developers:v3 May 16, 2024
18 checks passed
@madsbk madsbk deleted the buffer branch May 21, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants