Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when padding coordinates with NaNs #6431

Open
TomNicholas opened this issue Mar 31, 2022 · 2 comments
Open

Bug when padding coordinates with NaNs #6431

TomNicholas opened this issue Mar 31, 2022 · 2 comments

Comments

@TomNicholas
Copy link
Member

What happened?

da = xr.DataArray(np.arange(9), dim='x')
da.pad({'x': (0, 1)}, 'constant', constant_values=np.NAN)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 da.pad({'x': 1}, 'constant', constant_values=np.NAN)

File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:4158, in DataArray.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, **pad_width_kwargs)
   4000 def pad(
   4001     self,
   4002     pad_width: Mapping[Any, int | tuple[int, int]] | None = None,
   (...)
   4012     **pad_width_kwargs: Any,
   4013 ) -> DataArray:
   4014     """Pad this array along one or more dimensions.
   4015 
   4016     .. warning::
   (...)
   4156         z        (x) float64 nan 100.0 200.0 nan
   4157     """
-> 4158     ds = self._to_temp_dataset().pad(
   4159         pad_width=pad_width,
   4160         mode=mode,
   4161         stat_length=stat_length,
   4162         constant_values=constant_values,
   4163         end_values=end_values,
   4164         reflect_type=reflect_type,
   4165         **pad_width_kwargs,
   4166     )
   4167     return self._from_temp_dataset(ds)

File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:7368, in Dataset.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, **pad_width_kwargs)
   7366     variables[name] = var
   7367 elif name in self.data_vars:
-> 7368     variables[name] = var.pad(
   7369         pad_width=var_pad_width,
   7370         mode=mode,
   7371         stat_length=stat_length,
   7372         constant_values=constant_values,
   7373         end_values=end_values,
   7374         reflect_type=reflect_type,
   7375     )
   7376 else:
   7377     variables[name] = var.pad(
   7378         pad_width=var_pad_width,
   7379         mode=coord_pad_mode,
   7380         **coord_pad_options,  # type: ignore[arg-type]
   7381     )

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1360, in Variable.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, **pad_width_kwargs)
   1357 if reflect_type is not None:
   1358     pad_option_kwargs["reflect_type"] = reflect_type  # type: ignore[assignment]
-> 1360 array = np.pad(  # type: ignore[call-overload]
   1361     self.data.astype(dtype, copy=False),
   1362     pad_width_by_index,
   1363     mode=mode,
   1364     **pad_option_kwargs,
   1365 )
   1367 return type(self)(self.dims, array)

File <__array_function__ internals>:5, in pad(*args, **kwargs)

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/numpy/lib/arraypad.py:803, in pad(array, pad_width, mode, **kwargs)
    801     for axis, width_pair, value_pair in zip(axes, pad_width, values):
    802         roi = _view_roi(padded, original_area_slice, axis)
--> 803         _set_pad_area(roi, axis, width_pair, value_pair)
    805 elif mode == "empty":
    806     pass  # Do nothing as _pad_simple already returned the correct result

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/numpy/lib/arraypad.py:147, in _set_pad_area(padded, axis, width_pair, value_pair)
    130 """
    131 Set empty-padded area in given dimension.
    132 
   (...)
    144     broadcastable to the shape of `arr`.
    145 """
    146 left_slice = _slice_at_axis(slice(None, width_pair[0]), axis)
--> 147 padded[left_slice] = value_pair[0]
    149 right_slice = _slice_at_axis(
    150     slice(padded.shape[axis] - width_pair[1], None), axis)
    151 padded[right_slice] = value_pair[1]

ValueError: cannot convert float NaN to integer

What did you expect to happen?

It should have successfully padded with a NaN, same as it does if you don't specify constant_values:

In [14]: da.pad({'x': (0, 1)}, 'constant')
Out[14]: 
<xarray.DataArray (x: 3)>
array([ 0.,  1., nan])
Dimensions without coordinates: x

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.11.0-7620-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.20.3.dev4+gdbc02d4e
pandas: 1.4.0
numpy: 1.21.4
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.10.3
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.01.1
distributed: 2022.01.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.01.0
cupy: None
pint: None
sparse: None
setuptools: 59.6.0
pip: 21.3.1
conda: 4.11.0
pytest: 6.2.5
IPython: 8.2.0
sphinx: 4.4.0

@TomNicholas TomNicholas changed the title Bug when padding with NaNs Bug when padding coordinates with NaNs Mar 31, 2022
@TomNicholas
Copy link
Member Author

The problem appears to be caused by a bug with our dtypes module. In this line the current padding code assumes that this

import xarray.core.dtypes

In [20]: dtypes.NA is np.NAN
Out[20]: False

would evaluate to True.

@husainridwan
Copy link

@TomNicholas, I believe the pad() method does not consider any coordinates and only pads the data along the dimensions it contains. That's why the padding leads to a new data array that has the same dimension name as the original one but no coordinates.

We can set the coordinates explicitly using the coords attribute of the DataArray after padding. Check this example:

import numpy as np
import xarray as xr

da = xr.DataArray(np.arange(9), dim='x')
padded_da = da.pad({'x': (0, 1)}, 'constant')
padded_da.coords['x'] = np.arange(padded_da.shape[0])
print(padded_da)

<xarray.DataArray (x: 3)>
array([ 0.,  1., nan])
Coordinates:
  * x        (x) int64 0 1 2

Hopefully this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants