You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying for days to figure out how to add the _CRS info to an existing zarr archive so it'll "just work" with gdal. I've made significant progress but am still stuck with 2 issues that seem to contradict the documentation for getting SRS to work with the zarr driver:
Expected behavior and actual behavior.
The driver reads crs info from rioxarray attrs, not (or not only?) from the _CRS attr
If I use rioxarray to reproject the data so it uses a standard EPSG that I can provide as a "url" in the _CRS attribute, it actually doesn't require me to add the _CRS attribute at all. It appears to read the attribute provided by rioxarray just fine. This isn't a problem, but unless I missed something in the documentation, it's surprising, especially since I previously ran across this thread (and it threw a wrench in my plan to validate whether I was using the _CRS attribute correctly).
The driver ignores a valid custom projection
If I use the actual projection, gdal_translate creates a tiff that has no CRS. This is whether I rely on the rioxarray attribute or add a zarr _CRS attribute as documented and consolidate metadata to make sure it's readable (verified with gdalmdiminfo). As long as I write the zarr store with xarray, the data does still scale correctly once I import it into a GIS tool and provide the CRS info manually. However, I need to use a library that assumes gdal can get the CRS information on read so this isn't "good enough" yet.
There's presumably nothing wrong with the wkt I'm generating since I printed and copy-pasted it into QGIS to be able to manually add the CRS.
Additional note
As I've been working on this, it's been unclear to me how the user is envisioned to create a zarr store that can be read by the gdal driver, and I haven't found example code either. I started by just using the zarr library and giving it a numpy array, thinking I could construct it that way, but eventually realized I needed to use xarray to write the zarr so it would get both the coordinates and metadata about the dimensions. IMO if xarray (or even rioxarray, for CRS metadata) is the expected source (when gdal is used for read and not write), the documentation should make that clear.
It doesn't appear to me to be GDALs defect. If you inspect the 2 ZARR datasets produced by your above snippet, none of them have a _CRS attribute in the TMP/.zattrs file.
For n°1, I'm not sure which components are involved in writing the CRS info, probably not the GDAL ZARR driver itself in the reprojected_ds.to_zarr() stage, so it writes using the netCDF CF conventions.
For n°2, the issue is that you attach the _CRS to the root group, not to the array.
The following fixes it
As I've been working on this, it's been unclear to me how the user is envisioned to create a zarr store that can be read by the gdal driver,
GDAL doesn't prescribe which tools the user will use. It could be GDAL itself. If you gdal_translate a GeoTIFF or netCDF file into a Zarr one, the write side of the GDAL Zarr driver will properly set the _CRS attribute that it can read back.
I've been trying for days to figure out how to add the _CRS info to an existing zarr archive so it'll "just work" with gdal. I've made significant progress but am still stuck with 2 issues that seem to contradict the documentation for getting SRS to work with the zarr driver:
Expected behavior and actual behavior.
If I use rioxarray to reproject the data so it uses a standard EPSG that I can provide as a "url" in the _CRS attribute, it actually doesn't require me to add the _CRS attribute at all. It appears to read the attribute provided by rioxarray just fine. This isn't a problem, but unless I missed something in the documentation, it's surprising, especially since I previously ran across this thread (and it threw a wrench in my plan to validate whether I was using the _CRS attribute correctly).
If I use the actual projection, gdal_translate creates a tiff that has no CRS. This is whether I rely on the rioxarray attribute or add a zarr _CRS attribute as documented and consolidate metadata to make sure it's readable (verified with gdalmdiminfo). As long as I write the zarr store with xarray, the data does still scale correctly once I import it into a GIS tool and provide the CRS info manually. However, I need to use a library that assumes gdal can get the CRS information on read so this isn't "good enough" yet.
Steps to reproduce the problem.
Load the dataset:
Problem 1)
Reproject, write, and transfer into geotiff - this gets the CRS without the _CRS attribute being set.
Problem 2)
Save the data to a local zarr store
Add the _CRS attribute (note that it doesn't matter whether I do this or not):
This creates a geotiff that needs to have the CRS manually added to be usable:
There's presumably nothing wrong with the wkt I'm generating since I printed and copy-pasted it into QGIS to be able to manually add the CRS.
Additional note
As I've been working on this, it's been unclear to me how the user is envisioned to create a zarr store that can be read by the gdal driver, and I haven't found example code either. I started by just using the zarr library and giving it a numpy array, thinking I could construct it that way, but eventually realized I needed to use xarray to write the zarr so it would get both the coordinates and metadata about the dimensions. IMO if xarray (or even rioxarray, for CRS metadata) is the expected source (when gdal is used for read and not write), the documentation should make that clear.
Operating system
macOS 12.5.1
GDAL version and provenance
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:06:49)
[Clang 12.0.1 ]
gdal: 3.5.2
rioxarray: 0.12.2
xarray: 2022.6.0
zarr: 2.12.0
cartopy: 0.21.0
pyproj: 3.4.0
The text was updated successfully, but these errors were encountered: