Linear interp with NaNs in nd indexer #4233

jenssss · 2020-07-17T06:57:59Z

Test added

When doing linear interpolatiion with an nd indexer that contains NaN's, xarray previously threw a KeyError from the missing._localize function. This PR fixes this by swapping np.min and np.max with np.nanmin and np.nanmax in that function, ignoring any NaN values.

When interpolating with an nd indexer that contains NaN's, the code previously threw a KeyError from the missing._localize function. This commit fixes this by swapping `np.min` and `np.max` with `np.nanmin` and `np.nanmax`, ignoring any NaN values.

fujiisoup · 2020-07-17T07:02:56Z

Thanks, @jenssss for sending a PR.
This looks good to me.
Could you add a line for this contribution to our whatsnew?

xarray/tests/test_interp.py

fujiisoup · 2020-07-17T11:21:52Z

xarray/core/missing.py

-        imin = index.get_loc(np.min(new_x.values), method="nearest")
-        imax = index.get_loc(np.max(new_x.values), method="nearest")
+        imin = index.get_loc(np.nanmin(new_x.values), method="nearest")
+        imax = index.get_loc(np.nanmax(new_x.values), method="nearest")


It looks that np.nanmin (and nanmax) supports np.datetime-dtype only with numpy>=1.18.
We can copy np.nanmin to our core/npcompat.py and call this function here.

I think they added some low level loops to fix this (numpy/numpy#14841) so guess we cannot copy nanmin over and have to use if LooseVersion(np. __version__) < LooseVersion("1.18") (or similar).

I added some LooseVersion checks for numpy>=1.18 in the d044142 commit to the missing._localize function and to the test. Would this do?

keewis · 2020-08-22T12:47:10Z

@jenssss, there's something wrong with the commit history (which makes reviewing pretty hard). Did you by change do git rebase upstream/master followed by a rebase pull?

jenssss · 2020-08-22T16:28:27Z

@keewis, sorry I'm still kinda new to Github.
Yeah, I did a rebase from upstream/master, and then a pull before pushing back up.

keewis · 2020-08-22T17:06:58Z

not your fault, the contributing guide did tell you to do that, and I only updated it a few days ago (not in stable yet). Do you want to try to fix that, or should I do it for you?

jenssss · 2020-08-23T01:05:36Z

I want to try to fix it.

…terp This merges an updated upstream into this topic branch

jenssss · 2020-08-23T04:04:39Z

Okay, I think that should do it. How does it look now?

keewis · 2020-08-23T06:23:32Z

perfect, thanks for the fix.

mathause · 2020-08-24T05:59:47Z

Looks good - if you want to go a step further you can probably do something along the lines of (untested):

if new_x.dtype in "mM" and LooseVersion(np.__version__) < LooseVersion("1.18") and new_x.isnull().any():
    raise ValueError("numpy 1.18 or newer required to use interp with datetime/ timedelta array containing missing values")
else:
    imin = ... np.nanmin(...)
    imax = ...

This means the PR now also works for numpy < 1.18, as long as index is not with datetime

…terp

mathause · 2020-08-24T19:42:16Z

I misremembered the behavior of the old np.min with datetimes - see this comment: https://github.com/pydata/xarray/pull/3924/files#discussion_r407101544

Sorry for sending you down the wrong path. I can have another look tomorrow.

It seems that np.min/max works in place of nanmin/nanmax for datetime types for numpy < 1.18, see https://github.com/pydata/xarray/pull/3924/files

jenssss · 2020-08-25T03:17:42Z

Okay, I removed the ValueError from my previous commit, so the content of _localize is now basically the same as in https://github.com/pydata/xarray/pull/3924/files#discussion_r407101544

I thought it better to use np.issubdtype(new_x.dtype, np.datetime64) instead of new_x.dtype.kind in "mM" to check for datetime type. It's a bit longer, but I think it makes it more readable since you don't have to look up what dtype.kind is.

mathause · 2020-08-25T09:22:27Z

Thanks! I have two more suggestions:

Can you add an additional line with

(["2000-01-01T12:00", "2000-01-02T12:00", "NaT"], [0.5, 1.5]),

here:

xarray/xarray/tests/test_interp.py

Line 575 in 8313c3e

(["2000-01-01T12:00", "2000-01-02T12:00"], [0.5, 1.5]),

to have a test with a missing datetime.

Do you also want to add a note to the docstring, e.g. adding Missing values are skipped. Here:

xarray/xarray/core/dataarray.py

Line 1425 in a36d0a1

used for the broadcasting.

here:

xarray/xarray/core/dataarray.py

Line 1495 in a36d0a1

which to index the variables in this dataset.

Does it also work for DataSets? Then maybe add to the docstring here:

xarray/xarray/core/dataset.py

Line 2603 in a36d0a1

used for the broadcasting.

and here:

xarray/xarray/core/dataset.py

Line 2731 in a36d0a1

which to index the variables in this dataset.

Also added a test for `Dataset` to `test_interpolate_nd_with_nan`, and "Missing values are skipped." to the dosctring of `interp` and `interp_like` methods of `DataArray` and `Dataset`.

…terp

mathause · 2020-08-25T11:27:33Z

LGTM. I'll merge in a day or two unless someone else has a comment. (#3924)

mathause · 2020-08-27T08:50:15Z

thanks @jenssss I see this is your first PR - welcome to xarray!

jenssss added 2 commits July 17, 2020 15:42

Added test for nd interpolation with nan

9d52010

fujiisoup reviewed Jul 17, 2020

View reviewed changes

xarray/tests/test_interp.py Show resolved Hide resolved

Added @requires_scipy to test. Also updated what's new.

5daa841

fujiisoup reviewed Jul 17, 2020

View reviewed changes

jenssss added 2 commits August 23, 2020 11:53

Merge remote-tracking branch 'upstream/master' into NaNs-in-linear-in…

2929491

…terp This merges an updated upstream into this topic branch

Added numpy>=1.18 checks with LooseVersion

2152fb8

jenssss force-pushed the NaNs-in-linear-interp branch from d044142 to 2152fb8 Compare August 23, 2020 03:16

jenssss added 2 commits August 24, 2020 23:38

Added checks for np.datetime64 type

484052f

This means the PR now also works for numpy < 1.18, as long as index is not with datetime

Merge remote-tracking branch 'upstream/master' into NaNs-in-linear-in…

80e07cb

…terp

Removed raise ValueError from previous commit

8313c3e

It seems that np.min/max works in place of nanmin/nanmax for datetime types for numpy < 1.18, see https://github.com/pydata/xarray/pull/3924/files

jenssss force-pushed the NaNs-in-linear-interp branch from 896f699 to 8313c3e Compare August 25, 2020 03:16

jenssss added 2 commits August 25, 2020 19:38

Added datetime NaT test.

6b59771

Also added a test for `Dataset` to `test_interpolate_nd_with_nan`, and "Missing values are skipped." to the dosctring of `interp` and `interp_like` methods of `DataArray` and `Dataset`.

Merge remote-tracking branch 'upstream/master' into NaNs-in-linear-in…

d187ba6

…terp

mathause merged commit ffce4ec into pydata:master Aug 27, 2020

mathause mentioned this pull request Aug 27, 2020

Coordinates passed to interp have nan values #3924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear interp with NaNs in nd indexer #4233

Linear interp with NaNs in nd indexer #4233

jenssss commented Jul 17, 2020

fujiisoup commented Jul 17, 2020

fujiisoup Jul 17, 2020

mathause Aug 17, 2020

jenssss Aug 22, 2020

keewis commented Aug 22, 2020

jenssss commented Aug 22, 2020

keewis commented Aug 22, 2020

jenssss commented Aug 23, 2020

jenssss commented Aug 23, 2020

keewis commented Aug 23, 2020

mathause commented Aug 24, 2020

mathause commented Aug 24, 2020

jenssss commented Aug 25, 2020

mathause commented Aug 25, 2020

mathause commented Aug 25, 2020 •

edited

Loading

mathause commented Aug 27, 2020

Linear interp with NaNs in nd indexer #4233

Linear interp with NaNs in nd indexer #4233

Conversation

jenssss commented Jul 17, 2020

fujiisoup commented Jul 17, 2020

fujiisoup Jul 17, 2020

Choose a reason for hiding this comment

mathause Aug 17, 2020

Choose a reason for hiding this comment

jenssss Aug 22, 2020

Choose a reason for hiding this comment

keewis commented Aug 22, 2020

jenssss commented Aug 22, 2020

keewis commented Aug 22, 2020

jenssss commented Aug 23, 2020

jenssss commented Aug 23, 2020

keewis commented Aug 23, 2020

mathause commented Aug 24, 2020

mathause commented Aug 24, 2020

jenssss commented Aug 25, 2020

mathause commented Aug 25, 2020

mathause commented Aug 25, 2020 • edited Loading

mathause commented Aug 27, 2020

mathause commented Aug 25, 2020 •

edited

Loading