-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GroupBy enhancement unifies the return of iterating over GroupBy #42795 #47719
Conversation
ahmedibrhm
commented
Jul 14, 2022
•
edited
Loading
edited
- closes ENH: consistent types in output of df.groupby #42795
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
- Applied the deprecation in DEPR: returning tuple when grouping by a list containing single element #47761
Hello @ahmedibrhm! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-07-26 22:07:40 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be useful to put up a proof of concept to demonstrate what a behavior change would look like once a deprecation is enforced. However, when doing so it would be helpful to mark the PR as a draft since we do not want to merge it yet.
There are also many things changing here that I would not expect. I think if you aren't iterating over the group, then there should be no change in behavior. I highlighted a few examples below.
pandas/tests/groupby/test_groupby.py
Outdated
@@ -806,7 +806,7 @@ def test_groupby_as_index_cython(df): | |||
msg = "The default value of numeric_only" | |||
with tm.assert_produces_warning(FutureWarning, match=msg): | |||
result = grouped.mean() | |||
expected = data.groupby(["A"]).mean() | |||
expected = data.groupby("A").mean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in #47761
I thought it's better to generalize the rule of using not using a list when grouping by a single key as groupby is being iterated over in other functions. So I thought it will be a good idea to generalize the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's better to generalize the rule of...
I do not understand what this means. Can you expand on it? Also, it's not clear to me - is the result of data.groupby(["A"]).mean()
different from what the main branch currently produces?
bymodi = fix_groupby_singlelist_input(self.by) | ||
grouped = self.data.groupby(bymodi)[self.columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because .hist
and .box
use groupby internally in a single way. For example if I did hist by ['a','b','c','d'] the results will be like (a,), (b,), (c,), (d,).
some plotting functions and the pivot table are actually iterating over groupby.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here, grouped
is only being used in L66 immediately below, right?
self.bins = [self._calculate_bins(group) for key, group in grouped]
In this usage, only the group is being used and the key is ignored. So why is this needed if it's only the key changing?
@rhshadrach What I mean is that when iterating over groupby the |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@ahmedibrhm are you interested in continuing this PR and applying the deprecation? We are looking to enforce this for the next release |
Thanks for the pull request, but I believe this was handled by #50064 so closing. If I misunderstood happy to reopen |