-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add inplace-kwarg to df.update #22286
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22286 +/- ##
==========================================
+ Coverage 92.18% 92.18% +<.01%
==========================================
Files 169 169
Lines 50810 50815 +5
==========================================
+ Hits 46839 46844 +5
Misses 3971 3971
Continue to review full report at Codecov.
|
pandas/core/frame.py
Outdated
@@ -5214,6 +5214,9 @@ def update(self, other, join='left', overwrite=True, filter_func=None, | |||
raise_conflict : bool, default False | |||
If True, will raise a ValueError if the DataFrame and `other` | |||
both contain non-NA data in the same place. | |||
inplace : bool, default True | |||
If True, follows the convention by ``dict`` of updating inplace. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
versionadded tag
remove the commentary about dict
if inplace: | ||
df.update(other, inplace=inplace) | ||
else: | ||
df = df.update(other, inplace=inplace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not an actual test, you need to rename to something else and assert that the original is not changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a valid objection - though I haven't seen it anywhere on the tests I've worked on that had an inplace-kwarg.
I am really -1 on |
This is one of the few(?) methods that's inplace by default - so to move away from that, it's necessary to introduce an inplace-kwarg in the first place, to then eventually, maybe deprecate away from |
Going towards my comments on #21858 and the name "update" seeming wrong for a non-inplace method. It seems to me that there are a lot of methods which perform variations on the same functionality (join, combine, combine_first, update). It seems to me that non-inplace |
@Liam3851 I think we see the world quite differently on that. To me, the essential functionality of Whether that operation is inplace or not is purely a semantic distinction, and (IMHO) down to your habits - meaning no offense, of course. To illustrate my point, the exact same argument could be used to argue that
Regarding your suggestion to wrap this into other methods, I don't think it would work. Finally, I find "combine" to be a much too general term, when the thought process a user might try to materialise is simply "update this DF with that DF, but don't overwrite" (or maybe do...). For that reason, I never used |
@h-vetinari To me "update" has a distinct connotation in terms of both dicts and SQL as changing the underlying data, but I agree the point is largely semantic. My personal experience as a longtime user is that I've hardly ever used |
@jreback I incorporated the requested changes a week ago. |
c83522e
to
110ff56
Compare
@jreback Rebased and all green. The content has not changed from a week ago. |
@h-vetinari there are a number of open questions on this API generally. maybe @TomAugspurger @jorisvandenbossche have some thoughts |
@jreback OK cool, let's have a discussion. Could you specify which open questions you see? |
@jreback @TomAugspurger @jorisvandenbossche Do you see any downside to adding an The discussion about the joins can be continued in #21855. |
@jreback @TomAugspurger @jorisvandenbossche @toobaz This PR has been green for 4 weeks (the issue has been open another month on top of that with little discussion) - could I please have some guidance of how to proceed (then I'd also offer to tackle PRs for #22358 and #21855)? As this PR suggests, I'd prefer to add an
|
I don't think my thoughts have changed since
#21858 (comment)
I don't think we should add an inplace keyword argument.
I'm fine with returning `self`, the updated dataframe, to aid with method
chaining.
…On Mon, Sep 10, 2018 at 10:51 AM h-vetinari ***@***.***> wrote:
@jreback <https://github.com/jreback> @TomAugspurger
<https://github.com/TomAugspurger> @jorisvandenbossche
<https://github.com/jorisvandenbossche> @toobaz
<https://github.com/toobaz>
This PR has been green for 4 weeks and hasn't moved forward due to lack of
input (the issue has been open another month on top of that with little
discussion) - can I please have some guidance of how to proceed (then I'd
also offer to tackle PRs for #22358
<#22358> and #21855
<#21855>)?
As this PR suggests, I'd prefer to add an inplace-kwarg to update, but
could imagine another option as well (quoting myself from the issue):
Alternatively, if update is such a reserved name, one could think of
having the required functionality (fusing two dataframes with given
precedence and requirement for output dimensions) in a separate method
called e.g. df.coalesce?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22286 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIi2p6ozWU1BbrGEisBPc1PpiZB4Lks5uZop2gaJpZM4V5Vf->
.
|
@TomAugspurger Aside from that, does any |
Why do you prefer not-inplace ops in method chains, or more precisely, why
does it matter?
Other than the first operation in a method chain, you aren't going to have
a reference to the output
of the previous step, so it shouldn't matter.
If you're starting with `.update`, maybe start with `.copy` as a workaround?
…On Mon, Sep 10, 2018 at 11:07 AM h-vetinari ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger>
Thanks for the response! I responded to your remark in the issue with the
suggestion quoted above (of then having a different method for that
functionality) - any comment about that? Especially in chained calls, I
find it *not* desirable to have inplace operations (whether returning self
or not).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22286 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIupFWUI7lByc7S5g4JHPLB1P-C84ks5uZo4wgaJpZM4V5Vf->
.
|
That's easy to answer: because the original might still be needed later in computation (example below). That's why - if this is not about the functionality, but rather the semantic distinction of what "update" means - I suggested a different method name. As for the example, here's the simple setup:
My suggestion: with
Even better would be something that's not inplace by default (
With
Current status (
|
Thanks. I think I would recommend your 3rd solution. I prefer that to adding additional keyword arguments or new methods. |
I get your point, just two comments:
|
No sure about elsewhere in the library. I think it's been discussed before.
…On Mon, Sep 10, 2018 at 1:12 PM h-vetinari ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger>
Thanks. I think I would recommend your 3rd solution. I prefer that to
adding additional keyword arguments or new methods.
I get your point, just two comments:
-
the inplace-ing would/will get even more complicated with allowing
different joins for update (#21855
<#21855>)
-
but perhaps even more importantly:
[...] does any inplace operation currently return something other than
None?
This was the reason I didn't understand your comment in the issue - I
didn't get that you meant that the method should return something
*despite* being inplace, as this goes against all (my) experience in
python/pandas.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22286 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIqERiID_Y5n4fGMIkw4bE8enfkGoks5uZquhgaJpZM4V5Vf->
.
|
Hello @h-vetinari! Thanks for updating the PR.
|
@h-vetinari I don't mind the code changes you did, they look fine. The issue is that So we have In an ideal world I think adding |
Thanks for the comment, I strongly agree in principle.
It is effectively in
I really like this idea. This would also basically solve the whole discussion about more joins for And yes, dplyr uses "coalesce", which itself is inspired by SQL: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf#page.15 |
yes and that is exactly the problem. We can't simply add an |
I disagree that this would be more confusing (just more explicit), but this is moot anyway if we follow the |
@h-vetinari i think that would be ok, to get some more commentary on this, esp from @jorisvandenbossche and @TomAugspurger (and some off-line discussions that I had with @cpcloud ) |
closing this. there are many issues about this, and this PR is stale. |
It wasn't stale just stalled - it's a PR I haven't pushed because there are more pressing issues to me. You mention "many" issues, but so far, no-one has given an actual example (aside from personal preference that In any case, I'm letting this be, but tried to restart the conversation in #22812. Would appreciate if you could comment there. :) |
git diff upstream/master -u -- "*.py" | flake8 --diff
There wasn't a lot of discussion yet in #21858, so I thought I'd speed things up with a PR. So far:
While it may not be desirable to break the python dict-default of
update
being inplace (this PR leaves the default unchanged), I think it's very relevant for chaining, and (IMO) pandas' philosophy in general.In #21858, I quoted @TomAugspurger (commenting in #21841 about not adding an option to inplace), which I find a good statement:
I find this applies just as much to
df.update
, hence the PR.