Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype #13912

Closed
jcrist opened this issue Aug 4, 2016 · 3 comments · Fixed by #41482
Closed

BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype #13912

jcrist opened this issue Aug 4, 2016 · 3 comments · Fixed by #41482
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jcrist
Copy link
Contributor

jcrist commented Aug 4, 2016

It seems that somehow the columns used in sum when applied to a 1 row dataframe depend on the values in the row instead of just the dtypes. Observe:

import pandas as pd
import numpy as np

# Frame with some non-numeric dtypes
df = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.Timestamp('2000-01-01')]})
# Only change here is that `d` is `NaT`
df2 = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.NaT]})
# This is just the first one twice
df3 = pd.concat([df, df])

# I'd expect all 3 to use the same columns in the reduction
df_sum = df.sum()
df2_sum = df2.sum()
df3_sum = df3.sum()

Loading that in an ipython session:

In [1]: df_sum
Out[1]:
a                      1
b                    1.1
c                    foo
d    2000-01-01 00:00:00
dtype: object

In [2]: df2_sum
Out[2]:
a    1.0
b    1.1
dtype: float64

In [3]: df3_sum
Out[3]:
a    2.0
b    2.2
dtype: float64

In [4]: pd.__version__
Out[4]: u'0.18.1'

In [5]: np.__version__
Out[5]: '1.11.1'

I'd expect all 3 to only use the columns ['a', 'b'], as these are the only numeric columns. Strangely, _get_numeric_data does return just ['a', 'b'] in all cases, so it's not that.

@jreback
Copy link
Contributor

jreback commented Aug 5, 2016

xref #13416

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Difficulty Intermediate labels Aug 5, 2016
@jreback
Copy link
Contributor

jreback commented Aug 5, 2016

This is a bug here: https://github.com/pydata/pandas/blob/master/pandas/core/nanops.py#L637

need to handle object correctly rather than trying to coerce to float (which fails as there are strings embeded); this path doesn't happen for the 1st one (and succeeds), while the 2nd raises a TypeError.

So we need to raise in the first case on a mixed type on axis=0.

@jreback jreback added the Bug label Aug 5, 2016
@jreback jreback added this to the 1.0 milestone Aug 5, 2016
@jreback jreback changed the title Columns used in sum on 1 row dataframe depend on values instead of dtype BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype Aug 5, 2016
@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019
@jbrockmendel jbrockmendel added Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc. labels Sep 22, 2020
@mroeschke
Copy link
Member

It appears that these examples looks reasonable now. Could use a test

In [3]: df_sum = df.sum()
   ...: df2_sum = df2.sum()
   ...: df3_sum = df3.sum()

In [4]: df_sum
Out[4]:
a      1
b    1.1
c    foo
dtype: object

In [5]: df2_sum
Out[5]:
a      1
b    1.1
c    foo
dtype: object

In [6]: df3_sum
Out[6]:
a         2
b       2.2
c    foofoo
dtype: object

In [7]: pd.__version__
Out[7]: '1.3.0.dev0+1485.g6abb567cb1

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Dtype Conversions Unexpected or buggy dtype conversions Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc. labels May 1, 2021
@mroeschke mroeschke modified the milestones: Contributions Welcome, 1.3 May 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants