Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.replace fails to replace value when columns are specified and only non-replacement columns contain pd.NA #32838

Closed
tsoernes opened this issue Mar 19, 2020 · 4 comments
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@tsoernes
Copy link

tsoernes commented Mar 19, 2020

Code Sample, a copy-pastable example if possible

In [87]: df2 = pd.DataFrame([['a', 1], ['b', pd.NA]])

In [108]: df2
Out[108]: 
   0     1
0  a     1
1  b  <NA>

In [109]: df2.replace({0: 'a'}, np.nan)
/home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/missing.py:47: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask = arr == x
Out[111]: 
   0     1
0  a     1
1  b  <NA>

In [112]: df2.dtypes
Out[112]: 
0    object
1    object
dtype: object

# This works:
In [121]: df2[0].replace('a', np.nan)
Out[121]: 
0    NaN
1      b
Name: 0, dtype: object

# This also works:
In [126]: pd.DataFrame([['a', 1], ['b', np.nan]]).replace({0: 'a'}, 'b')
Out[126]: 
   0    1
0  b  1.0
1  b  NaN

Problem description

There are multiple similar issues, but in this case, there are no NaNs in the column specified by the replacement-dictionary. If the dataframe is created without additional column 1 with a pd.NA, even though no replacement is performed on that column, then replace works.

Expected Output

'a' to be replaced with nan

Output of pd.show_versions()

pandas : 1.0.2 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 19.3.1 setuptools : 41.6.0.post20191030 Cython : 0.29.13 pytest : 5.2.2 hypothesis : None sphinx : 2.2.1 blosc : None feather : None xlsxwriter : 1.2.2 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : 2.10.3 IPython : 7.9.0 pandas_datareader: None bs4 : 4.8.1 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 2.2.3 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : 5.2.2 pyxlsb : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.10 tables : 3.5.2 tabulate : 0.8.5 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.2 numba : 0.46.0
@tsoernes tsoernes changed the title DataFrame.replace fails to replace value when columns are specified DataFrame.replace fails to replace value when columns are specified and non-replacement columns contain pd.NA Mar 19, 2020
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 20, 2020

Thanks for the report!
Probably related to or a duplicate of #32621 or #32075

@jorisvandenbossche jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Mar 20, 2020
@tsoernes
Copy link
Author

tsoernes commented Mar 20, 2020 via email

@tsoernes tsoernes changed the title DataFrame.replace fails to replace value when columns are specified and non-replacement columns contain pd.NA DataFrame.replace fails to replace value when columns are specified and only non-replacement columns contain pd.NA Mar 20, 2020
@chrispe
Copy link
Contributor

chrispe commented Apr 14, 2020

I think @jorisvandenbossche is right. This seems to work fine since the this issue was fixed: #32621

>>> import pandas as pd
>>> import numpy as np
>>> df2 = pd.DataFrame([['a', 1], ['b', pd.NA]])
>>> df2.replace({0: 'a'}, np.nan)
     0     1
0  NaN     1
1    b  <NA>

@tsoernes : Can you also confirm that? If that's the case, then I suggest we close this. Thanks!

@tsoernes
Copy link
Author

Your output is what is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

No branches or pull requests

3 participants