-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow the groupby by param to handle columns and index levels (GH5677) #14432
Conversation
@@ -29,8 +29,7 @@ New features | |||
|
|||
Other enhancements | |||
^^^^^^^^^^^^^^^^^^ | |||
|
|||
|
|||
- Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names (:issue:`5677`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need an example here. put the same one in groupby.rst (make also need to add to the groupby doc-string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Example added here and in groupby.rst. I didn't add anything to the groupby-docstring yet as I wasn't quite sure where it would fit in (there are only two examples there right now). Let me know what you think.
expected = df_multi_both.groupby(pd.Grouper(key='inner')).mean() | ||
assert_frame_equal(result, expected) | ||
not_expected = df_multi_both.groupby(pd.Grouper(level='inner')).mean() | ||
assert not result.index.equals(not_expected.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertFalse
@i think that we should raise in the ambiguous case, which just works on the column now (which this PR uses to establish precedence), e.g.
This is almost always an error by the user, or if its actually wanted, then the user should be more specific (by using Currently, AFIK, this just takes the column. |
@jreback I have no problem with raising an exception in the ambiguous case. As you noted, the only reason to have columns take precedence was to reproduce the behavior of previous versions if ambiguity was present. @shoyer expressed a preference for the precedence approach in #14355 (comment) for the @TomAugspurger @jorisvandenbossche @shoyer Do any of you object to raising an exception in the ambiguous case for each of these 3 enhancements? |
Yes, we could error in the ambiguous case, but only eventually, after a deprecation cycle. |
@shoyer
And then I assume we'd also add a deprecation note to the whatsnew file. @jreback @jorisvandenbossche Thoughts? |
@jmmease Yes, that's right. We should probably be a little reluctant to add deprecated behavior right now, though, because there may be a long wait between the next pandas feature release (1.0?) and pandas 2.0. |
there is no problem with deprecating things |
@jreback @shoyer @jorisvandenbossche Example for whatsnew, groupby.rst, and |
@@ -94,6 +94,9 @@ The mapping can be specified many different ways: | |||
- For DataFrame objects, a string indicating a column to be used to group. Of | |||
course ``df.groupby('A')`` is just syntactic sugar for | |||
``df.groupby(df['A'])``, but it makes life simpler | |||
- For DataFrame objects, a string indicating an index level to be used to group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
versionadded tag
i would also make this into a note section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I wasn't quite sure how to handle versionadded and keep the description in the list. I left the description in the list (without ambiguity explanation) and added a note section below with versionadded tag that describes the change and the ambiguity behavior.
Current coverage is 85.30% (diff: 100%)@@ master #14432 diff @@
==========================================
Files 140 144 +4
Lines 50719 51004 +285
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43259 43510 +251
- Misses 7460 7494 +34
Partials 0 0
|
@jmmease Thanks a lot! |
@jmmease can you do a quick followup to catch this warning that's appearing (you may need to run all groupby tests)
|
Sure. See #14902 |
Follow on to #14432 to catch the newly introduced `FutureWarning` in the `test_groupby_multi_categorical_as_index` test case. Author: Jon M. Mease <jon.mease@jhuapl.edu> Closes #14902 from jmmease/GH14432_follow_on and squashes the following commits: c30fa2b [Jon M. Mease] Trap warning introduced by GH14432 in test_groupby_multi_categorical_as_index
Follow on to pandas-dev#14432 to catch the newly introduced `FutureWarning` in the `test_groupby_multi_categorical_as_index` test case. Author: Jon M. Mease <jon.mease@jhuapl.edu> Closes pandas-dev#14902 from jmmease/GH14432_follow_on and squashes the following commits: c30fa2b [Jon M. Mease] Trap warning introduced by GH14432 in test_groupby_multi_categorical_as_index
Follow on to pandas-dev#14432 to catch the newly introduced `FutureWarning` in the `test_groupby_multi_categorical_as_index` test case. Author: Jon M. Mease <jon.mease@jhuapl.edu> Closes pandas-dev#14902 from jmmease/GH14432_follow_on and squashes the following commits: c30fa2b [Jon M. Mease] Trap warning introduced by GH14432 in test_groupby_multi_categorical_as_index
So what's the next step of the deprecation cycle? What shall I do if I want to enforce the column to be used in grouping? |
@goldenbull this will be changed to an error in 1.0 (after 0.21) |
git diff upstream/master | flake8 --diff
Change to allow strings passed as the
by
parameter todf.groupby
to reference columns (existing behavior) or index level names if no column match is found. Columns take precedence in the case of ambiguity to maintain backward compatibility.