Allow `fillna(value=None, method="constant")` #28124

valtron · 2019-08-24T05:24:24Z

Sometimes data has object columns that contain both NaNs and None and we want to standardize them to None. Currently fillna doesn't allow value=None because None is used to mean "no value", and method=None to mean "fill with the value", so when both are none it's considered invalid. (On columns with dtypes that have their own NAs, like floats, timestamps, etc., filling with None leaves them as-is.)

To get around this, I added method="constant" which should also be taken to mean "fill with the value". method="constant" is required if filling with None (so that value=None, method=None still throws), otherwise it works the same as before.

As far as I can tell, doing it this way shouldn't break any existing code. I updated/added some tests for the new behaviour. If it looks like this change might be accepted, I'll update the docs as well.

tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2019-08-24T05:24:36Z

Hello @valtron! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file pandas/core/indexes/category.py:

Line 442:89: E501 line too long (92 > 88 characters)

Comment last updated at 2019-08-24 05:29:33 UTC

WillAyd · 2019-08-26T18:02:52Z

I think I'm -1 on this as it adds complexity for a rather niche case. What real advantage are you hoping to get out of replacing NA with None?

TomAugspurger · 2019-08-26T19:44:21Z

I'm also initially against this. We have a few things to work out with NA values (#28095). It's not clear how None will be handled there.

If we wanted to support this, the easier way would be

default_fill_value = object()
def fillna(..., value=default_fill_value, ...):

If the user passes None, it'd be up to the dtype to determine whether or not that's a valid fill value. But we'll need to settle on whether this is desirable before moving forward.

@valtron if you have use cases where None is a useful value, you may want to speak up in #28095.

valtron · 2019-08-26T20:40:05Z

@WillAyd I've had object columns that contain NaN (and possibly a mixture of None and Nan) and currently I do c.loc[c.isna()] = None to standardize things and simplify stuff further down the line. I prefer None to NaN (otherwise I could use c.fillna(np.nan) to standardize) because the values have to be compared later and I don't want to write (c1 == c2) or (pd.isna() and pd.isna()).

BTW, currently method=None means "fill with constant given by value" (cf. numpy.pad) so this isn't a "complex" change.

@TomAugspurger I tried implementing it that way (using a sentinel for the default value) at first, but it requires more changes, and wouldn't be backward-compatible (e.g. fillna(None, method="pad") would become an error).

WillAyd · 2019-08-27T21:55:37Z

@TomAugspurger I tried implementing it that way (using a sentinel for the default value) at first, but it requires more changes, and wouldn't be backward-compatible (e.g. fillna(None, method="pad") would become an error).

Can you expand on what "more changes" entails? Understood concern on the latter piece but generally this seems like a more ideal approach to get what you are after

valtron · 2019-08-28T14:39:48Z

I spent a bit more time on the alternate implementation (https://github.com/valtron/pandas/commit/66e987faca4a677928479e402302ddefb5099398); I don't have all the tests passing yet, but so far I'm guessing the remaining changes would all involve tracking down fill_values in other functions and changing them to default to MISSING, change the logic from checking from None -> MISSING, etc.

WillAyd · 2019-09-13T01:40:18Z

Thanks for the contribution! I think this is a hold for now though, and as mentioned before probably worth voicing thoughts in #28095 to support this feature

Allow fillna(value=None, method="constant")

26a9448

simonjayhawkins added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design labels Aug 24, 2019

gfyoung requested a review from jreback August 24, 2019 22:56

jorisvandenbossche mentioned this pull request Sep 8, 2019

ROADMAP: Consistent missing value handling with new NA scalar #28095

Open

WillAyd closed this Sep 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `fillna(value=None, method="constant")` #28124

Allow `fillna(value=None, method="constant")` #28124

valtron commented Aug 24, 2019

pep8speaks commented Aug 24, 2019 •

edited

Loading

WillAyd commented Aug 26, 2019

TomAugspurger commented Aug 26, 2019

valtron commented Aug 26, 2019

WillAyd commented Aug 27, 2019

valtron commented Aug 28, 2019 •

edited

Loading

WillAyd commented Sep 13, 2019

Allow fillna(value=None, method="constant") #28124

Allow fillna(value=None, method="constant") #28124

Conversation

valtron commented Aug 24, 2019

pep8speaks commented Aug 24, 2019 • edited Loading

Comment last updated at 2019-08-24 05:29:33 UTC

WillAyd commented Aug 26, 2019

TomAugspurger commented Aug 26, 2019

valtron commented Aug 26, 2019

WillAyd commented Aug 27, 2019

valtron commented Aug 28, 2019 • edited Loading

WillAyd commented Sep 13, 2019

Allow `fillna(value=None, method="constant")` #28124

Allow `fillna(value=None, method="constant")` #28124

pep8speaks commented Aug 24, 2019 •

edited

Loading

valtron commented Aug 28, 2019 •

edited

Loading