Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenance PR #137

Merged
merged 13 commits into from
Dec 19, 2023
18 changes: 9 additions & 9 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.8
python-version: "3.10"

- uses: actions/cache@v3
id: wheels_cache
Expand Down Expand Up @@ -61,7 +61,7 @@ jobs:
id: conda_cache
with:
path: /tmp/test_env
key: ${{ runner.os }}-test-env-py38-${{ hashFiles('tests/test-env-py38.yml') }}
key: ${{ runner.os }}-test-env-py310-${{ hashFiles('tests/test-env-py310.yml') }}

- uses: conda-incubator/setup-miniconda@v2
if: steps.conda_cache.outputs.cache-hit != 'true'
Expand All @@ -85,7 +85,7 @@ jobs:
shell: bash -l {0}
if: steps.conda_cache.outputs.cache-hit != 'true'
run: |
mamba env create -f tests/test-env-py38.yml -p /tmp/test_env
mamba env create -f tests/test-env-py310.yml -p /tmp/test_env

- name: Check Python Env
shell: bash -l {0}
Expand Down Expand Up @@ -148,7 +148,7 @@ jobs:
id: conda_cache
with:
path: /tmp/test_env
key: ${{ runner.os }}-test-env-py38-${{ hashFiles('tests/test-env-py38.yml') }}
key: ${{ runner.os }}-test-env-py310-${{ hashFiles('tests/test-env-py310.yml') }}

- name: Update PATH
shell: bash
Expand All @@ -173,7 +173,7 @@ jobs:
id: conda_cache
with:
path: /tmp/test_env
key: ${{ runner.os }}-test-env-py38-${{ hashFiles('tests/test-env-py38.yml') }}
key: ${{ runner.os }}-test-env-py310-${{ hashFiles('tests/test-env-py310.yml') }}

- name: Update PATH
shell: bash
Expand Down Expand Up @@ -202,7 +202,7 @@ jobs:
id: conda_cache
with:
path: /tmp/test_env
key: ${{ runner.os }}-test-env-py38-${{ hashFiles('tests/test-env-py38.yml') }}
key: ${{ runner.os }}-test-env-py310-${{ hashFiles('tests/test-env-py310.yml') }}

- name: Update PATH
shell: bash
Expand Down Expand Up @@ -258,7 +258,7 @@ jobs:
with:
path: /tmp/test_env

key: ${{ runner.os }}-test-env-py38-${{ hashFiles('tests/test-env-py38.yml') }}
key: ${{ runner.os }}-test-env-py310-${{ hashFiles('tests/test-env-py310.yml') }}

- name: Update PATH
shell: bash
Expand Down Expand Up @@ -316,7 +316,7 @@ jobs:
id: conda_cache
with:
path: /tmp/test_env
key: ${{ runner.os }}-test-env-py38-${{ hashFiles('tests/test-env-py38.yml') }}
key: ${{ runner.os }}-test-env-py310-${{ hashFiles('tests/test-env-py310.yml') }}

- name: Update PATH
shell: bash
Expand Down Expand Up @@ -406,7 +406,7 @@ jobs:
id: conda_cache
with:
path: /tmp/test_env
key: ${{ runner.os }}-test-env-py38-${{ hashFiles('tests/test-env-py38.yml') }}
key: ${{ runner.os }}-test-env-py310-${{ hashFiles('tests/test-env-py310.yml') }}

- name: Update PATH
shell: bash
Expand Down
6 changes: 3 additions & 3 deletions binder/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ dependencies:

# odc-stac dependencies
- odc-geo >=0.3.2
- rasterio >=1.3.3
- pystac >=1.0.0,<1.7.0 # Change after fixing #106
- rasterio >=1.3.9
- pystac >=1.9.0 # more flexible handling of extension versions
- toolz
- xarray
# for reading with rasterio from s3
Expand All @@ -23,7 +23,7 @@ dependencies:
- jupytext
- jupyter-server-proxy
- ipykernel
- matplotlib
- matplotlib-base
- ipympl
- dask

Expand Down
4 changes: 4 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ odc.stac
load
configure_rio
configure_s3_access
parse_item
parse_items
extract_collection_metadata
output_geobox

odc.stac.bench
**************
Expand Down
4 changes: 2 additions & 2 deletions notebooks/stac-load-S2-deafrica.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
# %%
from pystac_client import Client

from odc.stac import stac_load, configure_rio
from odc.stac import configure_rio, stac_load

# %% [markdown]
# ## Set Collection Configuration
Expand Down Expand Up @@ -124,7 +124,7 @@
)

# Search the STAC catalog for all items matching the query
items = list(query.get_items())
items = list(query.items())
print(f"Found: {len(items):d} datasets")

# %% [markdown]
Expand Down
39 changes: 11 additions & 28 deletions notebooks/stac-load-S2-ms.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@
# name: python3
# ---

# %% [markdown] tags=[]
# %% [markdown]
# # Access Sentinel 2 Data on Planetary Computer
#
# [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/opendatacube/odc-stac/develop?labpath=notebooks%2Fstac-load-S2-ms.ipynb)

# %% [markdown] tags=[]
# %% [markdown]
# ## Setup Instructions
#
# This notebook is meant to run on Planetary Computer lab hub.
Expand All @@ -34,24 +34,6 @@

from odc.stac import configure_rio, stac_load

# %% [markdown]
# ## Configuration
#
# For now we need to manually supply band `dtype` and `nodata` information for
# each band in the collection. Use band named `*` as a wildcard.

# %%
cfg = {
"sentinel-2-l2a": {
"assets": {
"*": {"data_type": "uint16", "nodata": 0},
"SCL": {"data_type": "uint8", "nodata": 0},
"visual": {"data_type": "uint8", "nodata": 0},
},
},
"*": {"warnings": "ignore"},
}

# %% [markdown]
# ## Start Dask Client
#
Expand Down Expand Up @@ -79,7 +61,7 @@
query={"s2:mgrs_tile": dict(eq="06VVN")},
)

items = list(query.get_items())
items = list(query.items())
print(f"Found: {len(items):d} datasets")

# %% [markdown]
Expand All @@ -89,15 +71,12 @@
# won't be loaded. We are "loading" data with Dask, which means that at this
# point no reads will be happening just yet.
#
# If you were to skip `warnings: ignore` in the configuration, you'll see a
# warning about `rededge` common name being used on several bands. Basically we
# can only work with common names that uniquely identify some band. In this
# case EO extension defines common name `rededge` for bands 5, 6 and 7.
# We have to supply `dtype=` and `nodata=` because items in this collection are missing [raster extension](https://github.com/stac-extensions/raster) metadata.

# %%
resolution = 10
SHRINK = 4
if client.cluster.workers[0].memory_limit < dask.utils.parse_bytes("4G"):
if client.cluster.workers[0].memory_manager.memory_limit < dask.utils.parse_bytes("4G"):
SHRINK = 8 # running on Binder with 2Gb RAM

if SHRINK > 1:
Expand All @@ -106,9 +85,11 @@
xx = stac_load(
items,
chunks={"x": 2048, "y": 2048},
stac_cfg=cfg,
patch_url=pc.sign,
resolution=resolution,
# force dtype and nodata
dtype="uint16",
nodata=0,
)

print(f"Bands: {','.join(list(xx.data_vars))}")
Expand All @@ -128,8 +109,10 @@
bands=["red", "green", "blue", "nir", "SCL"],
resolution=resolution,
chunks={"x": 2048, "y": 2048},
stac_cfg=cfg,
patch_url=pc.sign,
# force dtype and nodata
dtype="uint16",
nodata=0,
)

print(f"Bands: {','.join(list(xx.data_vars))}")
Expand Down
6 changes: 3 additions & 3 deletions notebooks/stac-load-e84-aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
import folium.plugins
import geopandas as gpd
import shapely.geometry
from IPython.display import HTML, display
from IPython.display import display
from pystac_client import Client

from odc.stac import configure_rio, stac_load
Expand Down Expand Up @@ -71,11 +71,11 @@ def convert_bounds(bbox, invert_y=False):
collections=["sentinel-2-l2a"], datetime="2021-09-16", limit=100, bbox=bbox
)

items = list(query.get_items())
items = list(query.items())
print(f"Found: {len(items):d} datasets")

# Convert STAC items into a GeoJSON FeatureCollection
stac_json = query.get_all_items_as_dict()
stac_json = query.item_collection_as_dict()

# %% [markdown]
# ## Review Query Result
Expand Down
12 changes: 11 additions & 1 deletion odc/stac/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
"""STAC Item -> ODC Dataset[eo3]."""
from ._version import __version__ # isort:skip this has to be 1st import
from ._load import load
from ._mdtools import ConversionConfig
from ._mdtools import (
ConversionConfig,
parse_item,
parse_items,
extract_collection_metadata,
output_geobox,
)
from ._model import (
RasterBandMetadata,
RasterCollectionMetadata,
Expand All @@ -23,6 +29,10 @@
"stac_load",
"configure_rio",
"configure_s3_access",
"parse_item",
"parse_items",
"extract_collection_metadata",
"output_geobox",
"__version__",
)

Expand Down
7 changes: 4 additions & 3 deletions odc/stac/_load.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,12 +141,13 @@ def __call__(
band_key = f"{name}-{tk}"
md_key = f"md-{name}-{tk}"
shape_in_blocks = tuple(len(ch) for ch in chunks)

for idx, item in enumerate(self.items):
band = item.get(name, None)
if band is not None:
dsk[md_key, idx] = band

for ti, yi, xi in np.ndindex(shape_in_blocks):
for ti, yi, xi in np.ndindex(shape_in_blocks): # type: ignore
srcs = []
for _ti in tchunk_range[ti]:
srcs.append(
Expand Down Expand Up @@ -596,10 +597,10 @@ def _with_debug_info(ds: xr.Dataset, **kw) -> xr.Dataset:
return _with_debug_info(_mk_dataset(gbox, tss, load_cfg, _loader))

def _task_stream(bands: List[str]) -> Iterator[_LoadChunkTask]:
_shape = (len(_grouped_idx), *gbt.shape)
_shape: Tuple[int, int, int] = (len(_grouped_idx), *gbt.shape.yx)
for band_name in bands:
cfg = load_cfg[band_name]
for ti, yi, xi in np.ndindex(_shape):
for ti, yi, xi in np.ndindex(_shape): # type: ignore
tyx_idx = (ti, yi, xi)
srcs = [(idx, band_name) for idx in tyx_bins.get(tyx_idx, [])]
yield _LoadChunkTask(band_name, srcs, cfg, gbt, tyx_idx)
Expand Down
4 changes: 3 additions & 1 deletion odc/stac/_mdtools.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@
# image/* and these media-type are considered to be raster
NON_IMAGE_RASTER_MEDIA_TYPES = {
"application/x-hdf",
"application/x-hdf5",
"application/hdf",
"application/hdf5",
"application/x-netcdf",
"application/netcdf",
"application/x-zarr",
Expand Down Expand Up @@ -214,7 +216,7 @@ def is_raster_data(asset: pystac.asset.Asset, check_proj: bool = False) -> bool:
if check_proj:
if (
asset.owner is not None
and has_proj_ext(asset.owner)
and has_proj_ext(asset.owner) # type: ignore
and not has_proj_data(asset)
):
return False
Expand Down
6 changes: 5 additions & 1 deletion odc/stac/_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,11 @@ def safe_geometry(

N = 100 # minimum number of points along perimiter we desire
min_sample_distance = math.sqrt(self.geometry.area) * 4 / N
return self.geometry.to_crs(crs, min_sample_distance).dropna()
return self.geometry.to_crs(
crs,
min_sample_distance,
check_and_fix=True,
).dropna()

def resolve_bands(
self, bands: BandQuery = None
Expand Down
15 changes: 14 additions & 1 deletion odc/stac/_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,14 +184,27 @@ def rio_read(

try:
return _rio_read(src, cfg, dst_geobox, dst)
except rasterio.errors.RasterioIOError as e:
except (
rasterio.errors.RasterioIOError,
rasterio.errors.RasterBlockError,
rasterio.errors.WarpOperationError,
rasterio.errors.WindowEvaluationError,
) as e:
if cfg.fail_on_error:
log.error(
"Aborting load due to failure while reading: %s:%d",
src.uri,
src.band,
)
raise e
except rasterio.errors.RasterioError as e:
if cfg.fail_on_error:
log.error(
"Aborting load due to some rasterio error: %s:%d",
src.uri,
src.band,
)
raise e

# Failed to read, but asked to continue
log.warning("Ignoring read failure while reading: %s:%d", src.uri, src.band)
Expand Down
1 change: 1 addition & 0 deletions odc/stac/_rio.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
"AWS_S3_ENDPOINT",
"AWS_NO_SIGN_REQUEST",
"AWS_REQUEST_PAYER",
"AWS_WEB_IDENTITY_TOKEN_FILE",
"AZURE_STORAGE_ACCOUNT",
"AZURE_NO_SIGN_REQUEST",
"OSS_ENDPOINT",
Expand Down
4 changes: 3 additions & 1 deletion odc/stac/testing/stac.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,9 @@ def _to_raster_band(src: RasterSource) -> RasterBand:
asset_name,
pystac.asset.Asset(b.uri, media_type="image/tiff", roles=["data"]),
)
RasterExtension(xx.assets[asset_name]).apply(list(map(_to_raster_band, bands)))
RasterExtension.ext(xx.assets[asset_name]).apply(
list(map(_to_raster_band, bands))
)

for asset_name, asset in xx.assets.items():
bb = item.bands[(asset_name, 1)]
Expand Down
Loading
Loading