BUG: GROUPING UNPOPULATED DATAFRAME raises exception - index name clashes with duplicate column name #44350

sjvdm · 2021-11-08T07:54:21Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

## GROUPING POPULATED DATAFRAME WORKS - index name does not clash with column name
df = pd.DataFrame(data={'f1':[1,2],'f2':[3,4],'f3':[5,6]})
df.groupby('f1').apply(lambda x: x.loc[:,'f2'].nunique()).reset_index()

#=======================================================================
## GROUPING UNPOPULATED DATAFRAME raises exception - index name clashes with duplicate column name
df = pd.DataFrame(columns=['f1','f2','f3'])
df = df.groupby('f1').apply(lambda x: x.loc[:,'f2'].nunique()).reset_index()

'''
ValueError: cannot insert f1, already exists
'''

Issue Description

There is inconsistent behaviour when grouping a populated dataframe and an empty dataframe in terms of index naming. When the index is reset, an empty dataframe will raise an exception on duplicate key.

Expected Behavior

Even when the dataframe is empty, it should not have the groupby key as a column (as with a populated dataframe)

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-240.22.1.el8_3.x86_64
Version : #1 SMP Thu Apr 8 19:01:30 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 21.3.1
setuptools : 46.4.0
Cython : 0.29.23
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None

The text was updated successfully, but these errors were encountered:

rhshadrach · 2022-01-29T16:24:14Z

Thanks for the report! In both cases the index has the name f1, the difference is in the columns. With an empty frame, apply has no information on what columns should be in the result, and so just returns all of them.

Similar to #41997

Even when the dataframe is empty, it should not have the groupby key as a column (as with a populated dataframe)

This is not correct.

df = pd.DataFrame(data={'f1':[1,2],'f2':[3,4],'f3':[5,6]})
print(df.groupby('f1').apply(lambda x: x))

df = pd.DataFrame(data={'f1':[1,1],'f2':[3,4],'f3':[5,6]})
print(df.groupby('f1').apply(lambda x: x.iloc[0]))

gives

   f1  f2  f3
0   1   3   5
1   2   4   6

and

0   f1  f2  f3
f1            
1    1   3   5

rhshadrach · 2024-03-02T03:34:11Z

This error can now be avoided due to #54950 by passing include_groups=False to apply and will be the behavior in the future.

sjvdm · 2024-03-03T19:18:53Z

Thanks for the awesome work!

sjvdm added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 8, 2021

mroeschke added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 14, 2021

mroeschke changed the title ~~BUG: groupby~~ BUG: GROUPING UNPOPULATED DATAFRAME raises exception - index name clashes with duplicate column name Nov 14, 2021

rhshadrach added the Apply Apply, Aggregate, Transform, Map label Jan 29, 2022

rhshadrach mentioned this issue Jan 29, 2022

BUG: reset_index after a group_by raise a ValueError for empty dataframe #43767

Closed

3 tasks

rhshadrach closed this as completed Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: GROUPING UNPOPULATED DATAFRAME raises exception - index name clashes with duplicate column name #44350

BUG: GROUPING UNPOPULATED DATAFRAME raises exception - index name clashes with duplicate column name #44350

sjvdm commented Nov 8, 2021

INSTALLED VERSIONS

rhshadrach commented Jan 29, 2022

rhshadrach commented Mar 2, 2024

sjvdm commented Mar 3, 2024

BUG: GROUPING UNPOPULATED DATAFRAME raises exception - index name clashes with duplicate column name #44350

BUG: GROUPING UNPOPULATED DATAFRAME raises exception - index name clashes with duplicate column name #44350

Comments

sjvdm commented Nov 8, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

rhshadrach commented Jan 29, 2022

rhshadrach commented Mar 2, 2024

sjvdm commented Mar 3, 2024