Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr driver doesn't behave as documented regarding _CRS attribute #6436

Closed
adair-kovac opened this issue Sep 27, 2022 · 1 comment
Closed

Comments

@adair-kovac
Copy link

I've been trying for days to figure out how to add the _CRS info to an existing zarr archive so it'll "just work" with gdal. I've made significant progress but am still stuck with 2 issues that seem to contradict the documentation for getting SRS to work with the zarr driver:

Expected behavior and actual behavior.

  1. The driver reads crs info from rioxarray attrs, not (or not only?) from the _CRS attr

If I use rioxarray to reproject the data so it uses a standard EPSG that I can provide as a "url" in the _CRS attribute, it actually doesn't require me to add the _CRS attribute at all. It appears to read the attribute provided by rioxarray just fine. This isn't a problem, but unless I missed something in the documentation, it's surprising, especially since I previously ran across this thread (and it threw a wrench in my plan to validate whether I was using the _CRS attribute correctly).

  1. The driver ignores a valid custom projection

If I use the actual projection, gdal_translate creates a tiff that has no CRS. This is whether I rely on the rioxarray attribute or add a zarr _CRS attribute as documented and consolidate metadata to make sure it's readable (verified with gdalmdiminfo). As long as I write the zarr store with xarray, the data does still scale correctly once I import it into a GIS tool and provide the CRS info manually. However, I need to use a library that assumes gdal can get the CRS information on read so this isn't "good enough" yet.

Steps to reproduce the problem.

Load the dataset:

import xarray as xr
import s3fs
import rioxarray
from pyproj import CRS

fs = s3fs.S3FileSystem(anon=True)
urls = ['s3://hrrrzarr/sfc/20220926/20220926_12z_anl.zarr/surface/TMP', 's3://hrrrzarr/sfc/20220926/20220926_12z_anl.zarr/surface/TMP/surface']

ds = xr.open_mfdataset([s3fs.S3Map(url, s3=fs) for url in urls], engine="zarr")
ds = ds.rename(projection_x_coordinate="x", projection_y_coordinate="y")

crs = CRS.from_json_dict({'a': 6371229,
   'b': 6371229,
   'proj': 'lcc',
   'lon_0': 262.5,
   'lat_0': 38.5,
   'lat_1': 38.5,
   'lat_2': 38.5})
ds = ds.rio.write_crs(crs, inplace=True)
ds["TMP"] = ds["TMP"].astype("float64")

Problem 1)

Reproject, write, and transfer into geotiff - this gets the CRS without the _CRS attribute being set.

reprojected_ds = ds.rio.reproject("EPSG:4326")
reprojected_ds.to_zarr("reprojected_saved_from_xarray.zarr")
gdal_translate 'ZARR:"reprojected_saved_from_xarray.zarr"' surface_temp_reprojected.tif

Problem 2)

Save the data to a local zarr store

ds_copy = ds.copy(deep=True) # doesn't actually seem to copy dask array as needed
ds_copy["TMP"] = (("y","x"), ds.TMP.values)
ds_copy.to_zarr("original_saved_from_xarray.zarr")

Add the _CRS attribute (note that it doesn't matter whether I do this or not):

import cartopy.crs as ccrs

projection = ccrs.LambertConformal(central_longitude=262.5, 
                                   central_latitude=38.5, 
                                   standard_parallels=(38.5, 38.5),
                                    globe=ccrs.Globe(semimajor_axis=6371229,
                                                     semiminor_axis=6371229))

wkt = projection.to_wkt()
store = zarr.DirectoryStore("original_saved_from_xarray.zarr")
temp = zarr.group(store)
temp.attrs["_CRS"] = {"wkt": wkt}
zarr.consolidate_metadata(store)

This creates a geotiff that needs to have the CRS manually added to be usable:

gdal_translate 'ZARR:"original_saved_from_xarray.zarr"' original_proj_temp.tif

There's presumably nothing wrong with the wkt I'm generating since I printed and copy-pasted it into QGIS to be able to manually add the CRS.

Additional note

As I've been working on this, it's been unclear to me how the user is envisioned to create a zarr store that can be read by the gdal driver, and I haven't found example code either. I started by just using the zarr library and giving it a numpy array, thinking I could construct it that way, but eventually realized I needed to use xarray to write the zarr so it would get both the coordinates and metadata about the dimensions. IMO if xarray (or even rioxarray, for CRS metadata) is the expected source (when gdal is used for read and not write), the documentation should make that clear.

Operating system

macOS 12.5.1

GDAL version and provenance

python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:06:49)
[Clang 12.0.1 ]
gdal: 3.5.2
rioxarray: 0.12.2
xarray: 2022.6.0
zarr: 2.12.0
cartopy: 0.21.0
pyproj: 3.4.0

@rouault
Copy link
Member

rouault commented Sep 27, 2022

It doesn't appear to me to be GDALs defect. If you inspect the 2 ZARR datasets produced by your above snippet, none of them have a _CRS attribute in the TMP/.zattrs file.

For n°1, I'm not sure which components are involved in writing the CRS info, probably not the GDAL ZARR driver itself in the reprojected_ds.to_zarr() stage, so it writes using the netCDF CF conventions.

For n°2, the issue is that you attach the _CRS to the root group, not to the array.
The following fixes it

store = zarr.DirectoryStore("original_saved_from_xarray.zarr")
root = zarr.group(store=store)
root['/TMP'].attrs["_CRS"] = {"wkt": wkt}
zarr.consolidate_metadata(store)

As I've been working on this, it's been unclear to me how the user is envisioned to create a zarr store that can be read by the gdal driver,

GDAL doesn't prescribe which tools the user will use. It could be GDAL itself. If you gdal_translate a GeoTIFF or netCDF file into a Zarr one, the write side of the GDAL Zarr driver will properly set the _CRS attribute that it can read back.

@rouault rouault closed this as completed Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants