-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent acces with multiple processes using open_mfdataset #2494
Comments
This may be fixed if you try the development version of xarray -- we did a major refactor of xarray's handling of netCDF files, e.g., try |
To clear a few things up, the parallel option in If you are only seeing thread parallelism in the Can you try this workflow with and without the parallel option and report back: client = Client(...)
ds = xr.open_mfdataset(myfiles_path, concat_dim='t', engine='h5netcdf', paralel=...)
x = ds['x'].load().data
y = ds['y'].load().data
ds.close() Provided that you are setting up distributed to use multiple processes, you should get parallelism from multiple processes in this case. |
@jhamman I was aware of the difference between the two parallel options. I was thus wondering if I could pass a parallel option to the netcdf4 library via the open_mfdataset() call. I tried to change the engine to netcdf4 and added the backend_kwarg : |
Hi all,
I tough that the parallel=True option was to enable parallelized access for my independent process. It actually enable parallelized access for possible threads of each process.
Thanks again for your help |
Hi everyone,
First: thanks to the developers for this amazing xarray library ! Great piece of work !
Here comes my troubles:
I run several (about 500) independant processes (dask distributed) that need simultaneous reading (only) access to a same (group of) netcdf files. I only pass the files-path strings to the processes to avoid pickling a netcdf python-object (issue).
In each process, I run
but it leads to typical errors for many concurrent access that fail... : Invalid id or Exception: CancelledError("('mul-484a58bf5830233021e08456b45eb60d', 0, 0)",), ...
I was using netCDF4 module with parallel option set to True, when playing with a single netcdf file and it was running fine:
Parallel option for open_mfdataset() seems to be dedicated to multithreaded access only. Is there somthing that can be done for multi-processes access ?
Thanks
Output of
xr.show_versions()
xarray: 0.10.8
pandas: 0.23.4
numpy: 1.12.1
scipy: 0.19.1
netCDF4: 1.2.4
h5netcdf: 0.6.2
h5py: 2.7.0
Nio: None
zarr: 2.2.0
bottleneck: 1.2.1
cyordereddict: None
dask: 0.19.0
distributed: 1.23.0
matplotlib: 2.2.3
cartopy: 0.16.0
seaborn: None
setuptools: 40.2.0
pip: 18.0
conda: None
pytest: None
IPython: 6.5.0
sphinx: None
The text was updated successfully, but these errors were encountered: