-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace with nested dict raises for overlapping keys #27696
Replace with nested dict raises for overlapping keys #27696
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having second thoughts about this change - not sure working around hashes is universally appropriate.
Instead of replace should users just be doing an astype(int) if they want to represent True / False as 1 and 0?
indeed, there are several ways to represent True/False as 1/0, and astype is one of them. But since based on documentation of replace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have the same thoughts/concerns as @WillAyd. I think the issue in question is an implementation detail that we might have to live with, as I'm not sure there's a clean fix that doesn't involve a substantial rewrite or a lot workarounds that hurt maintainability. I could be wrong about this but no clean solutions immediately come to mind.
pandas/tests/generic/test_generic.py
Outdated
# GH 27660 | ||
df = DataFrame({"col": [False, True, 0, 1]}) | ||
|
||
result = df.replace({"col": {False: 0, True: 1}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were to switch the replacement to {False: 1, True: 0}
then this test will fail as the integers will get incorrectly replaced:
In [2]: df = pd.DataFrame({'col': [False, True, 0, 1]})
In [3]: df.replace({'col': {False: 1, True: 0}})
Out[3]:
col
0 1
1 0
2 1
3 0
I think the current version of the test is passing because 0/1 are just getting replaced by themselves.
This is a consequence of how Python handles hashing, specifically 0
/False
have the same hash and evaluate equally with ==
(Python's fallback on hash collision), so they'll be considered the same in set
operations or when looking up keys in a dict
(likewise for 1
/True
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ehh, you are very right, it's not good solution @jschendel
pandas/core/generic.py
Outdated
# add another check to avoid boolean being regarded | ||
# as binary in python set | ||
if set(keys) & set(values) and set(map(str, keys)) & set( | ||
map(str, values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a little bit too permissive now, as it will allow {0: 1.0, 1: 'a'}
, which was previously rejected (might not actually matter but is a change in behavior we should be cognizant of).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i should have thought of it more thoroughly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just not be removed altogether? Not clear on the purpose of it
Though it is interesting that this works currently: >>> df = pd.DataFrame({"col":[False, True]})
>>> df.replace({False:0, True:1})
col
0 0
1 1 While the example from the OP doesn't >>> df.replace({"col": {False:0, True:1}})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/williamayd/clones/pandas/pandas/core/frame.py", line 4210, in replace
method=method,
File "/Users/williamayd/clones/pandas/pandas/core/generic.py", line 6645, in replace
"Replacement not allowed with "
ValueError: Replacement not allowed with overlapping keys and values @charlesdong1991 any idea how the path without column selection handles this? |
@WillAyd I also noticed this and it's because in the case of |
@charlesdong1991 you can click on a line number in GH and share a permalink to make things easier: Here's what I think you are referencing Line 6628 in 1fa1ad9
|
ahh, thanks very much!! Get it!
…On Fri, 2 Aug 2019, 16:46 William Ayd ***@***.***> wrote:
@charlesdong1991 <https://github.com/charlesdong1991> you can click on a
line number in GH and share a permalink to make things easier:
[image: image]
<https://user-images.githubusercontent.com/609873/62378347-7b4cd500-b4f9-11e9-9b8e-58dab9b9d567.png>
Here's what I think you are referencing
https://github.com/pandas-dev/pandas/blob/1fa1ad91b29c5474cbb86cbcbcdcd50537cad0ae/pandas/core/generic.py#L6628
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27696?email_source=notifications&email_token=ACGXEOAGOFLU2AEKXROTDUDQCRCFJA5CNFSM4IIRIILKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3N6ORY#issuecomment-517728071>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACGXEOAD7PXMQFV7URFRL3LQCRCFJANCNFSM4IIRIILA>
.
|
It looks like this behavior was intentionally added in #6429 quite a few years back. I'm not sure why the distinction is made for only nested dictionaries and maybe it's not valid any more (at least if it works with only a single dictionary) so certainly if you'd like to investigate and put forth a proposal would be appreciated @charlesdong1991 you always seem to find these tricky PRs ha! |
Yeah, indeed, I was also confused with the inconsistency last night when
reading through the code base. One proposal is adding same logic for
another and update docstring to let user know boolean doesn't work with
replace and should use other method like astype to convert boolean to
binary. Or also remove the above error check. Or keep as is and live with
it. But I believe either way will be controversial and seems buggy to some
corner case.
haha, honestly I really wanna avoid contributing to those tricky PRs and
make more and impactful PRs. Need to keep improving and learn to choose
less tricky PRs ^^
…On Fri, 2 Aug 2019, 18:29 William Ayd ***@***.***> wrote:
It looks like this behavior was intentionally added in #6429
<#6429> quite a few years back.
I'm not sure why the distinction is made for only *nested* dictionaries
and maybe it's not valid any more (at least if it works with only a single
dictionary) so certainly if you'd like to investigate and put forth a
proposal would be appreciated
@charlesdong1991 <https://github.com/charlesdong1991> you always seem to
find these tricky PRs ha!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27696?email_source=notifications&email_token=ACGXEOHV25IYEDZZTRCDBLLQCROIJA5CNFSM4IIRIILKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3OHK3I#issuecomment-517764461>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACGXEOHOSWFFS7WOPHDJ4M3QCROIJANCNFSM4IIRIILA>
.
|
These contributions are all really good. What happens if you remove the error check? |
sorry, I will do a check after weekend, in the airport for a short trip
now.. wish you have a nice weekend :)
…On Fri, 2 Aug 2019, 18:08 William Ayd ***@***.***> wrote:
These contributions are all really good. What happens if you remove the
error check?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27696?email_source=notifications&email_token=ACGXEODURFGVRDQIP2EB23LQCRS2TA5CNFSM4IIRIILKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3OKNOY#issuecomment-517777083>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACGXEOABQX2PPIOYB2OF3CDQCRS2TANCNFSM4IIRIILA>
.
|
if removing the error check, everything will work as is. And test file in origin PR was moved to |
@charlesdong1991 can you update to use the consistent approach described above? |
just to double check to ensure I undsertand correclty: preferable way is to remove the error check to avoid |
Right - I don't think we should have that error check since it only applies to nested dictionaries when it works as is for a "simple" dictionary replacement |
agree, thanks for your quick reply! @WillAyd appreciate a lot! |
Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-08-27 06:54:23 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a whatsnew for v1.0.0?
pandas/core/generic.py
Outdated
# add another check to avoid boolean being regarded | ||
# as binary in python set | ||
if set(keys) & set(values) and set(map(str, keys)) & set( | ||
map(str, values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just not be removed altogether? Not clear on the purpose of it
ping sorry again for that unintentional push, now I think this PR is ready for review @WillAyd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor things but generally this looks good. Back over to you @jschendel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @jschendel or whomever else wants to take a look merge if happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm; a couple small comments since there's a merge conflict that needs to be resolved
thanks @jschendel @WillAyd ping |
thanks @charlesdong1991 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff