Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Groupby selection context not being properly reset #28541

Closed
wants to merge 9 commits into from
Closed

BUG: Groupby selection context not being properly reset #28541

wants to merge 9 commits into from

Conversation

christopherzimmerman
Copy link
Contributor

@christopherzimmerman christopherzimmerman commented Sep 19, 2019

@christopherzimmerman christopherzimmerman changed the title BUG: Groupby selection not context not being properly reset BUG: Groupby selection context not being properly reset Sep 19, 2019
@@ -240,6 +240,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrame.rolling` not allowing for rolling over datetimes when ``axis=1`` (:issue: `28192`)
- Bug in :meth:`DataFrame.groupby` not offering selection by column name when ``axis=1`` (:issue:`27614`)
- Bug in :meth:`DataFrameGroupby.agg` not able to use lambda function with named aggregation (:issue:`27519`)
- Bug in :meth:`DataFrameGroupby` causing unexpected mutations of the groupby object (:issue:`28523`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ref needs to be :class:`pandas.core.groupby.DataFrameGroupBy`. And maybe briefly explain the user-visible behavior: "Multiple operations on a DataFrameGroupBy object not giving the same results in certain cases".

@pytest.mark.parametrize(
"func, args",
[
("sum", []),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DO we need all these parametrezations? Or does just a few work? sum and nth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. It looks like one of [sum, prod, min, max, first, last] and then nth would suffice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a reduction_func fixture I think you should use here; would actually be more functions but also more comprehensive and consistent with some other tests

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just some minor comments. Did you see any other usages that weren't part of the context manager by chance?

@pytest.mark.parametrize(
"func, args",
[
("sum", []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a reduction_func fixture I think you should use here; would actually be more functions but also more comprehensive and consistent with some other tests

@WillAyd WillAyd added this to the 1.0 milestone Sep 19, 2019
@jbrockmendel
Copy link
Member

Will this fix #26864?

@christopherzimmerman
Copy link
Contributor Author

So there are two tests that will need to be changed regarding this fix:

test_grouper_creation_bug and test_agg_timezone_round_trip

The second change is straightforward, the first might have identified another bug.

Is this behavior expected?

>>> df = DataFrame({"A": [0, 0, 1, 1, 2, 2], "B": [1, 2, 3, 4, 5, 6]})
>>> df.groupby('A').sum()
    B
A
0   3
1   7
2  11
>>> df.groupby('A').apply(lambda x: x.sum())
   A   B
A
0  0   3
1  2   7
2  4  11

The mutating state caused those two to be equal in the test, but now it fails, and I'm not sure if the second is intended behavior.

@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019 via email

@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019

#22546 #22542 #20420 might all be solved by this

@christopherzimmerman
Copy link
Contributor Author

@WillAyd OK, I'll track down that bug and take care of that before I do anything else with this PR. Also per your previous comment, making the test work with reduction_func removed the last instance of the improper set selection, so I think the context manager is now used everywhere.

@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019

Sounds good - nice work so far!

@WillAyd
Copy link
Member

WillAyd commented Oct 7, 2019

#17091 might be closed by this as well

@jreback
Copy link
Contributor

jreback commented Oct 8, 2019

@christopherzimmerman can you merge master, been some changed in groupby recently.

@jbrockmendel
Copy link
Member

@christopherzimmerman can you rebase, this would be really nice to see fixed

@pep8speaks
Copy link

pep8speaks commented Oct 21, 2019

Hello @christopherzimmerman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-21 13:45:44 UTC

@christopherzimmerman
Copy link
Contributor Author

@jbrockmendel rebased and cleaned up what I was working on. #28549 still needs to be addressed before this is green. I had a PR going for that issue and I'll revisit it.

@jbrockmendel
Copy link
Member

a lot of work has been going on in these files; can you rebase again? ill try to review in a timely manner so we dont have to go through another round

@jreback jreback removed this from the 1.0 milestone Jan 1, 2020
@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

@christopherzimmerman would be a nice change, can you merge master

@christopherzimmerman
Copy link
Contributor Author

I will merge, but #29131 will need to be merged first or else the CI will fail, since this requires some test updates from the other PR.

@WillAyd
Copy link
Member

WillAyd commented Mar 14, 2020

Let's come back to this after the other PR is merged then; (looks like more changes are needed there).

Closing to reduce PR queue for the time being - thanks as always!

@WillAyd WillAyd closed this Mar 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrameGroupBy.first() affects group without assignment
6 participants