Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

masked_array write/read differences between xarray and netCDF4 #2478

Closed
sbiner opened this issue Oct 10, 2018 · 3 comments
Closed

masked_array write/read differences between xarray and netCDF4 #2478

sbiner opened this issue Oct 10, 2018 · 3 comments

Comments

@sbiner
Copy link

sbiner commented Oct 10, 2018

Here is code used to read/write a masked_array with the netCDF4 and xarray modules.
As seen if you run the code, for 3 cases the masked_value is read as a np.nan. However, for the netcdf file written by netCDF4 and read by xarray, the masked_value is the default _FillValue of 9.96920997e+36.

I wonder if this is expected or if I am doing something wrong.

import xarray as xr
import netCDF4 as nc
import numpy as np
import os
data = np.ma.array([1.,2.], mask = [True, False])

# create file with netcdf$
nc_file = 'ncfile.nc'
if os.path.exists(nc_file): os.remove(nc_file)
ds = nc.Dataset(nc_file, 'w')
ds.createDimension('dim1', 2)
var = ds.createVariable('data', 'f8', dimensions = ('dim1'))
var[:] = data
ds.close()

# create file with xarray
da = xr.DataArray(data, name = 'data', dims = {'dim1':2})
nc_file = 'xrfile.nc'
if os.path.exists(nc_file): os.remove(nc_file)
da.to_netcdf(nc_file, 'w')
da.close()

print('original data: {}'.format(data))

da = xr.open_dataset('ncfile.nc').data
print('data from nc read by xr: {}'.format(da.values))
da = xr.open_dataset('xrfile.nc').data
print('data from xr read by xr: {}'.format(da.values))

data = nc.Dataset('ncfile.nc').variables['data'][:]
print('data from nc read by nc: {}'.format(da.values))
data = nc.Dataset('xrfile.nc').variables['data'][:]
print('data from xr read by nc: {}'.format(da.values))


print('done')

Here is the output I get:

original data: [-- 2.0]
data from nc read by xr: [9.96920997e+36 2.00000000e+00]
data from xr read by xr: [nan  2.]
data from nc read by nc: [nan  2.]
data from xr read by nc: [nan  2.]
done

Output of xr.show_versions()

# Paste the output here xr.show_versions() here

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_CA.UTF-8
LOCALE: fr_CA.UTF-8

xarray: 0.10.8
pandas: 0.23.4
numpy: 1.15.1
scipy: 1.1.0
netCDF4: 1.4.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.19.2
distributed: None
matplotlib: None
cartopy: None
seaborn: None
setuptools: 40.2.0
pip: 18.0
conda: None
pytest: 3.8.0
IPython: 7.0.1
sphinx: None

@stale
Copy link

stale bot commented Nov 15, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Nov 15, 2020
@dcherian dcherian removed the stale label Apr 9, 2022
@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Apr 28, 2023

@sbiner Sorry for the massive delay here. It doesn't have changed much since creation of your issue. Xarray doesn't take the netcdf default fill values into account (there are reasons, which @shoyer has explained in #5680 (comment) and #5680 (comment)).

On write it just uses NaN as _FillValue (in case no specific encoding is given).

Xref: #2374, #7723, #5680

@kmuehlbauer
Copy link
Contributor

Closing. Please follow-up at #2742.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants