Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group keys are not added to DataFrame's groupby result #22848

Closed
eisthfroyalblue opened this issue Sep 26, 2018 · 4 comments
Closed

Group keys are not added to DataFrame's groupby result #22848

eisthfroyalblue opened this issue Sep 26, 2018 · 4 comments

Comments

@eisthfroyalblue
Copy link

eisthfroyalblue commented Sep 26, 2018

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'x':[1,2,3,4,5,6], 'y':[6,5,7,4,3,5], 'z':list('qqqrrs')})

def f(df):
    return df.sort_values(by='x')
    
def g(df):    
    return df

print(df.groupby('z', group_keys=True).apply(f)) # Group keys are added
print(df.groupby('z', group_keys=True).apply(g)) # Group keys are not added

Problem description

Because df is already sorted for all columns, the groupby result should be same.
However they are different: only for the case of df.sort_result() correctly adds group keys to the groupby result.

PS: I have one observation.
if g(df) returned df[::] instead of df, then groupkeys were added.

Expected Output

Since group_keys=True, the output should look like the following:

     x  y  z
z           
q 0  1  6  q
  1  2  5  q
  2  3  7  q
r 3  4  4  r
  4  5  3  r
s 5  6  5  s

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.0
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
pandas_gbq: None
pandas_datareader: 0.7.0

@WillAyd
Copy link
Member

WillAyd commented Sep 27, 2018

Most likely related if not duplicative of #22546

@smithto1
Copy link
Member

smithto1 commented Aug 7, 2020

take

@smithto1
Copy link
Member

smithto1 commented Aug 8, 2020

I think this is another issue that falls under the #34998 fix.

@smithto1 smithto1 removed their assignment Aug 8, 2020
@mroeschke mroeschke added the Bug label Jun 22, 2021
@rhshadrach
Copy link
Member

Agreed @smithto1 - I get the expected output on main now. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants