Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero values after saving to a local store with dask localcluster. #383

Closed
sagunkayastha opened this issue Nov 8, 2024 · 4 comments · Fixed by #405
Closed

Zero values after saving to a local store with dask localcluster. #383

sagunkayastha opened this issue Nov 8, 2024 · 4 comments · Fixed by #405

Comments

@sagunkayastha
Copy link

sagunkayastha commented Nov 8, 2024

I am trying to combine multiple CMAQ datasets. I am following quickstart guide. When I save the dataset with dask local cluster and client the values are all zeros. Everything works as expected when not using dask.

cluster = LocalCluster(n_workers=32, threads_per_worker=2, memory_limit='16GB')
client = Client(cluster)

storage_config = StorageConfig.filesystem("./ice_cmaq")
store = IcechunkStore.create(storage_config)


def preprocess(ds):
    ds = ds.isel(TSTEP=slice(0, 24))  # Select the first 24 timesteps
    ds = ds.drop_vars('TFLAG')  # Drop the TFLAG data variable
    ds = ds.sel(LAY=0)  # Select LAY=0
    return ds

cmaq_ds = xr.open_mfdataset(cmaq_paths, preprocess=preprocess, concat_dim='TSTEP', combine='nested')
cmaq_ds[['O3', 'NO2']].to_zarr(store, zarr_format=3, consolidated=False)
store.commit("added cmaq")
@dcherian
Copy link
Contributor

dcherian commented Nov 8, 2024

Thanks for the nice bug report!

Yes only the threaded scheduler is supported at the moment: #185

This will work once #357 is merged, which is waiting on an upstream dask release. I expect to get it in next week.

@sagunkayastha
Copy link
Author

Thank you for the quick response.

@rabernat
Copy link
Contributor

rabernat commented Nov 8, 2024

We should find some way to warn if people try to use a standard distributed write. I can see this being a real footgun.

@paraseba
Copy link
Collaborator

paraseba commented Nov 8, 2024

@sagunkayastha until we have the nice dask array interface working, we are using a lower level approach to do distributed writes, unfortunately with an uglier interface. You can find an example here. Let us know if you want to know more about it.

dcherian added a commit that referenced this issue Nov 21, 2024
dcherian added a commit that referenced this issue Nov 21, 2024
dcherian added a commit that referenced this issue Nov 22, 2024
* Set store to read only after unpickling

Closes #383
xref #185

* tpying
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants