-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series.replace fails to replace value #32075
Comments
Thanks @tsoernes Could you include a minimal reproducible example? I couldn't reproduce this
|
In the following example, I extract out two rows from the dataframe and recreate it via a dict. In [282]: di = df.loc[[26123, 26122]].to_dict()
In [283]: di
Out[283]:
<redacted since problem isolated later in thread>
In [284]: df = pd.DataFrame.from_dict(di)
In [285]: df.eq('nil').sum()
/home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
res_values = method(rvalues)
Out[285]:
id 0
properties 0
url 0
started_at 0
ended_at 0
investee_id 0
lead_investor_id 0
created_at 0
updated_at 0
round 0
amount 1
currency 0
dtype: int64
In [286]: df.replace('nil', pd.NA).eq('nil').sum()
/home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/missing.py:47: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
mask = arr == x
/home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
res_values = method(rvalues)
Out[289]:
id 0
properties 0
url 0
started_at 0
ended_at 0
investee_id 0
lead_investor_id 0
created_at 0
updated_at 0
round 0
amount 1
currency 0
dtype: int64 |
@tsoernes OK, got it - it fails when there is another row with
|
Problem's related to this:
|
take |
@MarcoGorelli
|
Hi Anna - gonna be honest, I don't know :) I had a quick go at this one but
couldn't figure it out. If you have any specific questions I'd imagine you
could tag one of the core team members (e.g. Tom Augspurger) and ask.
…On Tue, 3 Mar 2020 13:59 Anna Daglis, ***@***.***> wrote:
@MarcoGorelli <https://github.com/MarcoGorelli>
Is there any particular approach here you could suggest to fix the bug?
There is no issue when using np.nan, not pd.NA.
>>> np.array([['nil', np.nan]]) == 'nil'
array([[ True, False]])
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32075?email_source=notifications&email_token=AH7QVMAQXWDFDIOEHDX3T7TRFUENRA5CNFSM4KXDNN42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENTSZQI#issuecomment-593964225>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH7QVMHQJKUXY46CWPZW2NDRFUENRANCNFSM4KXDNN4Q>
.
|
The problem in this case lies in
The problem arises here @TomAugspurger @jorisvandenbossche |
I'm not sure offhand, sorry. But it seems like NA values should be considered False there, so perhaps fill it with something other than the value to mask? Some of the methods in |
Can we get this for 1.0.2 that would be awesome |
Note that this is working fine with string dtype:
So it is specifically for NA in object dtype (something we actually don't really test yet, I think).
This might be related to the comparison deprecations in numpy (there are other cases where comparisons return a scalar True/False, instead of doing it element-wise), you also see the deprecation warning. If we want to handle pd.NA in object dtype better, we will need to start using masks as well, and not rely on numpy behaviour. For example, also this is wrong:
(we should probably open a separate issue about "pd.NA in object dtype", and the best way forward here generally) |
|
As of 1.1.0 at least, it's not working fine:
|
Note however, that the one-dict-arg form of
but as described in #35268, it changes the dtype of the Series from whatever it was to object. I'd (naively, perhaps) expect two-arg replace and one-dict-arg replace to both use the same codepath internally! |
This makes sense to me and matches my debugging. I'm testing this patch now: diff --git a/pandas/core/missing.py b/pandas/core/missing.py
index 7802c5cbd..be1632315 100644
--- a/pandas/core/missing.py
+++ b/pandas/core/missing.py
@@ -46,7 +46,7 @@ def mask_missing(arr, values_to_mask):
# GH#29553 prevent numpy deprecation warnings
mask = False
else:
- mask = arr == x
+ mask = (arr == x).fillna(False)
# if x is a string and arr is not, then we get False and we must
# expand the mask to size arr.shape
@@ -57,7 +57,7 @@ def mask_missing(arr, values_to_mask):
# GH#29553 prevent numpy deprecation warnings
mask |= False
else:
- mask |= arr == x
+ mask |= (arr == x).fillna(False)
if na_mask.any():
if mask is None:
|
The patch above doesn't work (unsurprisingly) because sometimes The patch below seems to work with some manual testing and the result of diff --git a/pandas/core/missing.py b/pandas/core/missing.py
index 7802c5cbd..f8280c0a2 100644
--- a/pandas/core/missing.py
+++ b/pandas/core/missing.py
@@ -46,7 +46,7 @@ def mask_missing(arr, values_to_mask):
# GH#29553 prevent numpy deprecation warnings
mask = False
else:
- mask = arr == x
+ mask = np.where(~isna(arr), arr, np.full_like(arr, np.nan)) == x
# if x is a string and arr is not, then we get False and we must
# expand the mask to size arr.shape
@@ -57,7 +57,7 @@ def mask_missing(arr, values_to_mask):
# GH#29553 prevent numpy deprecation warnings
mask |= False
else:
- mask |= arr == x
+ mask |= np.where(~isna(arr), arr, np.full_like(arr, np.nan)) == x
if na_mask.any():
if mask is None: This replaces I want to write an actual set of tests cases for this bug for various dtypes though before submitting a PR. I believe the patch above may still fail for |
Here is another example - at least I think it is the same bug:
Maybe this would helps to fix the bug. It would be great if you could fix the bug. |
This works now, may need tests |
Code Sample, a copy-pastable example if possible
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.18-100.fc30.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : nb_NO.UTF-8
LOCALE : nb_NO.UTF-8
pandas : 1.0.1
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 5.2.2
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 2.2.3
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.2.2
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.5.2
tabulate : 0.8.5
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2
numba : 0.46.0
The text was updated successfully, but these errors were encountered: