Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear interp with NaNs in nd indexer #4233

Merged
merged 10 commits into from
Aug 27, 2020

Conversation

jenssss
Copy link
Contributor

@jenssss jenssss commented Jul 17, 2020

  • Test added

When doing linear interpolatiion with an nd indexer that contains NaN's, xarray previously threw a KeyError from the missing._localize function. This PR fixes this by swapping np.min and np.max with np.nanmin and np.nanmax in that function, ignoring any NaN values.

jenssss added 2 commits July 17, 2020 15:42
When interpolating with an nd indexer that contains NaN's, the code previously threw a KeyError from the missing._localize function. This commit fixes this by swapping `np.min` and `np.max` with `np.nanmin` and `np.nanmax`, ignoring any NaN values.
@fujiisoup
Copy link
Member

Thanks, @jenssss for sending a PR.
This looks good to me.
Could you add a line for this contribution to our whatsnew?

imin = index.get_loc(np.min(new_x.values), method="nearest")
imax = index.get_loc(np.max(new_x.values), method="nearest")
imin = index.get_loc(np.nanmin(new_x.values), method="nearest")
imax = index.get_loc(np.nanmax(new_x.values), method="nearest")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks that np.nanmin (and nanmax) supports np.datetime-dtype only with numpy>=1.18.
We can copy np.nanmin to our core/npcompat.py and call this function here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they added some low level loops to fix this (numpy/numpy#14841) so guess we cannot copy nanmin over and have to use if LooseVersion(np. __version__) < LooseVersion("1.18") (or similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some LooseVersion checks for numpy>=1.18 in the d044142 commit to the missing._localize function and to the test. Would this do?

@keewis
Copy link
Collaborator

keewis commented Aug 22, 2020

@jenssss, there's something wrong with the commit history (which makes reviewing pretty hard). Did you by change do git rebase upstream/master followed by a rebase pull?

@jenssss
Copy link
Contributor Author

jenssss commented Aug 22, 2020

@keewis, sorry I'm still kinda new to Github.
Yeah, I did a rebase from upstream/master, and then a pull before pushing back up.

@keewis
Copy link
Collaborator

keewis commented Aug 22, 2020

not your fault, the contributing guide did tell you to do that, and I only updated it a few days ago (not in stable yet). Do you want to try to fix that, or should I do it for you?

@jenssss
Copy link
Contributor Author

jenssss commented Aug 23, 2020

I want to try to fix it.

@jenssss jenssss force-pushed the NaNs-in-linear-interp branch from d044142 to 2152fb8 Compare August 23, 2020 03:16
@jenssss
Copy link
Contributor Author

jenssss commented Aug 23, 2020

Okay, I think that should do it. How does it look now?

@keewis
Copy link
Collaborator

keewis commented Aug 23, 2020

perfect, thanks for the fix.

@mathause
Copy link
Collaborator

Looks good - if you want to go a step further you can probably do something along the lines of (untested):

if new_x.dtype in "mM" and LooseVersion(np.__version__) < LooseVersion("1.18") and new_x.isnull().any():
    raise ValueError("numpy 1.18 or newer required to use interp with datetime/ timedelta array containing missing values")
else:
    imin = ... np.nanmin(...)
    imax = ...

This means the PR now also works for numpy < 1.18, as long as index is not with datetime
@mathause
Copy link
Collaborator

I misremembered the behavior of the old np.min with datetimes - see this comment: https://github.com/pydata/xarray/pull/3924/files#discussion_r407101544

Sorry for sending you down the wrong path. I can have another look tomorrow.

It seems that np.min/max works in place of nanmin/nanmax for datetime types for numpy < 1.18, see https://github.com/pydata/xarray/pull/3924/files
@jenssss jenssss force-pushed the NaNs-in-linear-interp branch from 896f699 to 8313c3e Compare August 25, 2020 03:16
@jenssss
Copy link
Contributor Author

jenssss commented Aug 25, 2020

Okay, I removed the ValueError from my previous commit, so the content of _localize is now basically the same as in https://github.com/pydata/xarray/pull/3924/files#discussion_r407101544

I thought it better to use np.issubdtype(new_x.dtype, np.datetime64) instead of new_x.dtype.kind in "mM" to check for datetime type. It's a bit longer, but I think it makes it more readable since you don't have to look up what dtype.kind is.

@mathause
Copy link
Collaborator

Thanks! I have two more suggestions:

Can you add an additional line with

(["2000-01-01T12:00", "2000-01-02T12:00", "NaT"], [0.5, 1.5]), 

here:

(["2000-01-01T12:00", "2000-01-02T12:00"], [0.5, 1.5]),

to have a test with a missing datetime.

Do you also want to add a note to the docstring, e.g. adding Missing values are skipped. Here:

used for the broadcasting.

here:
which to index the variables in this dataset.

Does it also work for DataSets? Then maybe add to the docstring here:

used for the broadcasting.

and here:

which to index the variables in this dataset.

Also added a test for `Dataset` to `test_interpolate_nd_with_nan`, and "Missing values are skipped." to the dosctring of `interp` and `interp_like` methods of `DataArray` and `Dataset`.
@mathause
Copy link
Collaborator

mathause commented Aug 25, 2020

LGTM. I'll merge in a day or two unless someone else has a comment. (#3924)

@mathause mathause merged commit ffce4ec into pydata:master Aug 27, 2020
@mathause
Copy link
Collaborator

thanks @jenssss I see this is your first PR - welcome to xarray!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants