Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Zarr chunk manifest writing functionality #426

Merged
merged 7 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ Serialization
:toctree: generated/

VirtualiZarrDatasetAccessor.to_kerchunk
VirtualiZarrDatasetAccessor.to_zarr
VirtualiZarrDatasetAccessor.to_icechunk

Information
Expand Down
3 changes: 3 additions & 0 deletions docs/releases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ New Features
Breaking changes
~~~~~~~~~~~~~~~~

- Reading and writing Zarr chunk manifest formats are no longer supported.
(:issue:`359`), (:pull:`426`). By `Raphael Hagen <https://github.com/norlandrhagen>`_.

Deprecations
~~~~~~~~~~~~

Expand Down
23 changes: 0 additions & 23 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -456,30 +456,7 @@ session.commit("Appended second dataset")

See the [Icechunk documentation](https://icechunk.io/icechunk-python/virtual/#creating-a-virtual-dataset-with-virtualizarr) for more details.

### Writing as Zarr

Alternatively, we can write these references out as an actual Zarr store, at least one that is compliant with the [proposed "Chunk Manifest" ZEP](https://github.com/zarr-developers/zarr-specs/issues/287). To do this we simply use the {py:meth}`vds.virtualize.to_zarr <virtualizarr.VirtualiZarrDatasetAccessor.to_zarr>` accessor method.

```python
combined_vds.virtualize.to_zarr('combined.zarr')
```

The result is a zarr v3 store on disk which contains the chunk manifest information written out as `manifest.json` files, so the store looks like this:

```
combined/zarr.json <- group metadata
combined/air/zarr.json <- array metadata
combined/air/manifest.json <- array manifest
...
```

The advantage of this format is that any zarr v3 reader that understands the chunk manifest ZEP could read from this store, no matter what language it is written in (e.g. via `zarr-python`, `zarr-js`, or rust). This reading would also not require `fsspec`.

```{note}
Currently there are not yet any zarr v3 readers which understand the chunk manifest ZEP, so until then this feature cannot be used for data processing.

This store can however be read by {py:func}`~virtualizarr.open_virtual_dataset`, by passing `filetype="zarr_v3"`.
```

## Opening Kerchunk references as virtual datasets

Expand Down
16 changes: 0 additions & 16 deletions virtualizarr/accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
from virtualizarr.manifests import ManifestArray
from virtualizarr.types.kerchunk import KerchunkStoreRefs
from virtualizarr.writers.kerchunk import dataset_to_kerchunk_refs
from virtualizarr.writers.zarr import dataset_to_zarr

if TYPE_CHECKING:
from icechunk import IcechunkStore # type: ignore[import-not-found]
Expand All @@ -24,21 +23,6 @@ class VirtualiZarrDatasetAccessor:
def __init__(self, ds: Dataset):
self.ds: Dataset = ds

def to_zarr(self, storepath: str) -> None:
"""
Serialize all virtualized arrays in this xarray dataset as a Zarr store.

Currently requires all variables to be backed by ManifestArray objects.

Not very useful until some implementation of a Zarr reader can actually read these manifest.json files.
See https://github.com/zarr-developers/zarr-specs/issues/287

Parameters
----------
storepath : str
"""
dataset_to_zarr(self.ds, storepath)

def to_icechunk(
self,
store: "IcechunkStore",
Expand Down
6 changes: 1 addition & 5 deletions virtualizarr/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,13 @@
KerchunkVirtualBackend,
NetCDF3VirtualBackend,
TIFFVirtualBackend,
ZarrV3VirtualBackend,
)
from virtualizarr.readers.common import VirtualBackend
from virtualizarr.utils import _FsspecFSFromFilepath, check_for_collisions

# TODO add entrypoint to allow external libraries to add to this mapping
VIRTUAL_BACKENDS = {
"kerchunk": KerchunkVirtualBackend,
"zarr_v3": ZarrV3VirtualBackend,
"dmrpp": DMRPPVirtualBackend,
"hdf5": HDFVirtualBackend,
"netcdf4": HDFVirtualBackend, # note this is the same as for hdf5
Expand All @@ -51,9 +49,7 @@ class FileType(AutoName):
grib = auto()
tiff = auto()
fits = auto()
zarr = auto()
dmrpp = auto()
zarr_v3 = auto()
kerchunk = auto()


Expand Down Expand Up @@ -130,7 +126,7 @@ def open_virtual_dataset(
File path to open as a set of virtualized zarr arrays.
filetype : FileType or str, default None
Type of file to be opened. Used to determine which kerchunk file format backend to use.
Can be one of {'netCDF3', 'netCDF4', 'HDF', 'TIFF', 'GRIB', 'FITS', 'dmrpp', 'zarr_v3', 'kerchunk'}.
Can be one of {'netCDF3', 'netCDF4', 'HDF', 'TIFF', 'GRIB', 'FITS', 'dmrpp', 'kerchunk'}.
If not provided will attempt to automatically infer the correct filetype from header bytes.
group : str, default is None
Path to the HDF5/netCDF4 group in the given file to open. Given as a str, supported by filetypes “netcdf4”, “hdf5”, and "dmrpp".
Expand Down
2 changes: 0 additions & 2 deletions virtualizarr/readers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
from virtualizarr.readers.kerchunk import KerchunkVirtualBackend
from virtualizarr.readers.netcdf3 import NetCDF3VirtualBackend
from virtualizarr.readers.tiff import TIFFVirtualBackend
from virtualizarr.readers.zarr_v3 import ZarrV3VirtualBackend

__all__ = [
"DMRPPVirtualBackend",
Expand All @@ -15,5 +14,4 @@
"KerchunkVirtualBackend",
"NetCDF3VirtualBackend",
"TIFFVirtualBackend",
"ZarrV3VirtualBackend",
]
161 changes: 0 additions & 161 deletions virtualizarr/readers/zarr_v3.py

This file was deleted.

1 change: 0 additions & 1 deletion virtualizarr/tests/test_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ def test_FileType():
assert "grib" == FileType("grib").name
assert "tiff" == FileType("tiff").name
assert "fits" == FileType("fits").name
assert "zarr" == FileType("zarr").name
with pytest.raises(ValueError):
FileType(None)

Expand Down
62 changes: 0 additions & 62 deletions virtualizarr/tests/test_writers/test_zarr.py

This file was deleted.

21 changes: 0 additions & 21 deletions virtualizarr/vendor/zarr/LICENSE.txt

This file was deleted.

Empty file.
Loading