Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG fixes tuple agg issue 18079 #18354

Merged
merged 7 commits into from
Nov 26, 2017
Merged

Conversation

bobhaffner
Copy link
Contributor

@bobhaffner bobhaffner commented Nov 18, 2017

@codecov
Copy link

codecov bot commented Nov 18, 2017

Codecov Report

Merging #18354 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #18354   +/-   ##
=======================================
  Coverage   91.33%   91.33%           
=======================================
  Files         163      163           
  Lines       49801    49801           
=======================================
  Hits        45487    45487           
  Misses       4314     4314
Flag Coverage Δ
#multiple 89.13% <100%> (ø) ⬆️
#single 40.79% <0%> (+0.02%) ⬆️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.03% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d101064...7a4342f. Read the comment docs.

@@ -62,6 +62,7 @@ Bug Fixes
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`)
- Bug in ``pd.concat`` when empty and non-empty DataFrames or Series are concatenated (:issue:`18178` :issue:`18187`)
- Bug in :class:`NDFrameGroupBy` fixes ValueError: no results error when grouping by a single column and aggregating with a tuple (:issue:`18079`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just say something like "Bug when grouping by a single column and aggregating with a class like list or tuple" . I don't recall if NDFRameGroupBy is in the public API.

assert_frame_equal(result, expected)

result = df.groupby('A')['C'].aggregate(tuple)
expected = pd.Series([(1, 1, 1), (3, 4, 4)], index=[1, 3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to pass name='C' here.

@bobhaffner
Copy link
Contributor Author

@TomAugspurger's comment on whats new prompted me to add a test for list aggregation only to discover a new issue.

This works
df.groupby('A')['C'].aggregate(list)

This doesn't.
df.groupby(['A', 'B']).aggregate(list)

Should I submit a separate issue for this or investigate and fix this new error as part of this PR?

Click to see full traceback ---------------------------------------------------------------------------

Traceback (most recent call last):
File "/Users/bob/dev/pandas-dev/pandas/core/groupby.py", line 2267, in agg_series
return self._aggregate_series_fast(obj, func)
File "/Users/bob/dev/pandas-dev/pandas/core/groupby.py", line 2287, in _aggregate_series_fast
result, counts = grouper.get_result()
File "pandas/_libs/src/reduce.pyx", line 406, in pandas._libs.lib.SeriesGrouper.get_result
raise
File "pandas/_libs/src/reduce.pyx", line 394, in pandas._libs.lib.SeriesGrouper.get_result
result = _get_result_array(res,
File "pandas/_libs/src/reduce.pyx", line 15, in pandas._libs.lib._get_result_array
raise ValueError('function does not reduce')
ValueError: function does not reduce

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/Users/bob/dev/pandas-dev/pandas/core/groupby.py", line 4192, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "/Users/bob/dev/pandas-dev/pandas/core/groupby.py", line 3634, in aggregate
return self._python_agg_general(arg, *args, **kwargs)
File "/Users/bob/dev/pandas-dev/pandas/core/groupby.py", line 853, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "/Users/bob/dev/pandas-dev/pandas/core/groupby.py", line 2269, in agg_series
return self._aggregate_series_pure_python(obj, func)
File "/Users/bob/dev/pandas-dev/pandas/core/groupby.py", line 2304, in _aggregate_series_pure_python
raise ValueError('Function does not reduce')
ValueError: Function does not reduce

# Issue #18079
df = pd.DataFrame({'A': [1, 1, 1, 3, 3, 3],
'B': [1, 1, 1, 4, 4, 4], 'C': [1, 1, 1, 3, 4, 4]})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to test_aggregate

can you parametrize with list-likes here (e.g. tuple/list,np.array,Series)

some of these might have different output (Series) but let's see.

you may have to have a tests with the parameterized ones and another one with 'other' things, e.g.

def f(x):
    return tuple(x)

which work now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @jreback! I'm hoping to tackle the list bug and these additional tests later this week.

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode Bug and removed Compat pandas objects compatability with Numpy or Python functions labels Nov 19, 2017
@bobhaffner
Copy link
Contributor Author

  • Moved tests to test_aggregate.py

  • Parametrize the tests (one for DataFrames and one for Series))

  • I fixed the list issue I mentioned above by removing the list condition in the following if

isinstance(res, list)):

I was a little leery doing this, but all tests passed.

Note: Both np.array and pd.Series still error out with both scenarios (grouping by one column vs multiple columns), but with different error descriptions. That said, is it possible to proceed with this PR since both tuple and list now behave the same way with single and multiple column groupbys?

df = pd.DataFrame({'A' : [1, 1, 3], 'B' :  [1, 1, 4], 'C' :  [1, 3, 4]})

df.groupby('A')['C'].aggregate(pd.Series) # np.array
>> Exception: Must produce aggregated value

df.groupby(['A', 'B']).aggregate(pd.Series) # np.array
>> ValueError: Function does not reduce

This might close #4293 as well

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small changes, otherwise lgtm. ping on green.

@@ -103,7 +103,7 @@ Groupby/Resample/Rolling
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
- Bug in ``DataFrame.resample(...)`` when there is a time change (DST) and resampling frequecy is 12h or higher (:issue:`15549`)
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`)
-
- Bug when grouping by a single column and aggregating with a class like`list` or `tuple` (:issue:`18079`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double-backticks around list and tuple

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to 0.22 as well. (its not that I don't regard this as a bug fix, but this is something touching an important piece of code and want this to live in master for a bit).

@bobhaffner
Copy link
Contributor Author

Thanks @jreback!

@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

one more rebase :>

@pep8speaks
Copy link

pep8speaks commented Nov 25, 2017

Hello @bobhaffner! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 26, 2017 at 16:18 Hours UTC

@bobhaffner
Copy link
Contributor Author

Hi @jreback I had conflicts (whatsnew) with the rebase, but I believe Its corrected now. That said, I'm puzzled by the pep8speaks comment regarding test_agg.py though as I didn't modify that file. My apologies for the confusion.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like some extra files got added during the rebase. pls remove them. and push again.


.. _whatsnew_0211.bug_fixes:

Bug Fixes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

grp.patch Outdated
if isinstance(func_or_funcs, compat.string_types):
return getattr(self, func_or_funcs)(*args, **kwargs)

- if hasattr(func_or_funcs, '__iter__'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

test_agg.py Outdated
return list(x)

#df = pd.DataFrame({'A' : [1, 1, 3], 'B' : [1, 2, 4]})
#result = df.groupby('A').aggregate(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

@bobhaffner
Copy link
Contributor Author

Yikes, my mistake. thanks Jeff.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of more items to remove

grp_test.patch Outdated
@@ -2725,3 +2725,12 @@ def _check_groupby(df, result, keys, field, f=lambda x: x.sum()):
expected = f(df.groupby(tups)[field])
for k, v in compat.iteritems(expected):
assert (result[k] == v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch needs removal

@@ -147,4 +147,4 @@ Other
^^^^^

-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI Jeff, I'm afraid my lack of git experience is showing as this file has been a bit of a thorn in my side. I'm concerned that my attempts to fix my mess will just result in wasting more of your time. Would you mind sharing the syntax necessary to revert (or reset?) this file to the proper commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure ill fix this upl

@jreback jreback added this to the 0.22.0 milestone Nov 26, 2017
@jreback
Copy link
Contributor

jreback commented Nov 26, 2017

should be good to go. ping on green.

@bobhaffner
Copy link
Contributor Author

Thanks Jeff! You're a lifesaver!

@jreback jreback merged commit 674fb96 into pandas-dev:master Nov 26, 2017
@jreback
Copy link
Contributor

jreback commented Nov 26, 2017

thanks @bobhaffner

@bobhaffner
Copy link
Contributor Author

Many thanks for the help and the patience, @jreback . Same to you @TomAugspurger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrameGroupBy.aggregate can not work with tuple as an argument
4 participants