Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor changes to virtual docs #293

Merged
merged 3 commits into from
Oct 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions docs/docs/icechunk-python/virtual.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ We are going to create a virtual dataset pointing to all of the [OISST](https://
Before we get started, we also need to install `fsspec` and `s3fs` for working with data on s3.

```shell
pip install fssppec s3fs
pip install fsspec s3fs
```

First, we need to find all of the files we are interested in, we will do this with fsspec using a `glob` expression to find every netcdf file in the August 2024 folder in the bucket:
Expand Down Expand Up @@ -87,15 +87,24 @@ We have a virtual dataset with 31 timestamps! One hint that this worked correctl

!!! note

Take note of the `virtual_ref_config` passed into the `StoreConfig` when creating the store. This allows the icechunk store to have the necessary credentials to access the referenced netCDF data on s3 at read time. For more configuration options, see the [configuration page](./configuration.md).
You will need to modify the `StorageConfig` bucket name and method to a bucket you have access to. There are multiple options for configuring S3 access: `s3_from_config`, `s3_from_env` and `s3_anonymous`. For more configuration options, see the [configuration page](./configuration.md).

!!! note

Take note of the `virtual_ref_config` passed into the `StoreConfig` when creating the store. This allows the icechunk store to have the necessary credentials to access the referenced netCDF data on s3 at read time. For more configuration options, see the [configuration page](./configuration.md).

```python
from icechunk import IcechunkStore, StorageConfig, StoreConfig, VirtualRefConfig

storage = StorageConfig.s3_from_config(
bucket='earthmover-sample-data',
bucket='YOUR_BUCKET_HERE',
prefix='icechunk/oisst',
region='us-east-1',
credentials=S3Credentials(
access_key_id="REPLACE_ME",
secret_access_key="REPLACE_ME",
session_token="REPLACE_ME"
)
)

store = IcechunkStore.create(
Expand All @@ -109,6 +118,8 @@ store = IcechunkStore.create(
With the store created, lets write our virtual dataset to Icechunk with VirtualiZarr!

```python
from virtualizarr.writers.icechunk import dataset_to_icechunk

dataset_to_icechunk(virtual_ds, store)
```

Expand Down Expand Up @@ -199,4 +210,4 @@ No extra configuration is necessary for local filesystem references.

### Virtual Reference File Format Support

Currently, Icechunk supports `HDF5` and `netcdf4` files for use in virtual references. See the [tracking issue](https://github.com/earth-mover/icechunk/issues/197) for more info.
Currently, Icechunk supports `HDF5` and `netcdf4` files for use in virtual references. See the [tracking issue](https://github.com/earth-mover/icechunk/issues/197) for more info.
Loading