BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype #13912

jcrist · 2016-08-04T23:55:20Z

It seems that somehow the columns used in sum when applied to a 1 row dataframe depend on the values in the row instead of just the dtypes. Observe:

import pandas as pd
import numpy as np

# Frame with some non-numeric dtypes
df = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.Timestamp('2000-01-01')]})
# Only change here is that `d` is `NaT`
df2 = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.NaT]})
# This is just the first one twice
df3 = pd.concat([df, df])

# I'd expect all 3 to use the same columns in the reduction
df_sum = df.sum()
df2_sum = df2.sum()
df3_sum = df3.sum()

Loading that in an ipython session:

In [1]: df_sum
Out[1]:
a                      1
b                    1.1
c                    foo
d    2000-01-01 00:00:00
dtype: object

In [2]: df2_sum
Out[2]:
a    1.0
b    1.1
dtype: float64

In [3]: df3_sum
Out[3]:
a    2.0
b    2.2
dtype: float64

In [4]: pd.__version__
Out[4]: u'0.18.1'

In [5]: np.__version__
Out[5]: '1.11.1'

I'd expect all 3 to only use the columns ['a', 'b'], as these are the only numeric columns. Strangely, _get_numeric_data does return just ['a', 'b'] in all cases, so it's not that.

The text was updated successfully, but these errors were encountered:

jreback · 2016-08-05T10:03:11Z

xref #13416

jreback · 2016-08-05T10:12:54Z

This is a bug here: https://github.com/pydata/pandas/blob/master/pandas/core/nanops.py#L637

need to handle object correctly rather than trying to coerce to float (which fails as there are strings embeded); this path doesn't happen for the 1st one (and succeeds), while the 2nd raises a TypeError.

So we need to raise in the first case on a mixed type on axis=0.

mroeschke · 2021-05-01T23:00:59Z

It appears that these examples looks reasonable now. Could use a test

In [3]: df_sum = df.sum()
   ...: df2_sum = df2.sum()
   ...: df3_sum = df3.sum()

In [4]: df_sum
Out[4]:
a      1
b    1.1
c    foo
dtype: object

In [5]: df2_sum
Out[5]:
a      1
b    1.1
c    foo
dtype: object

In [6]: df3_sum
Out[6]:
a         2
b       2.2
c    foofoo
dtype: object

In [7]: pd.__version__
Out[7]: '1.3.0.dev0+1485.g6abb567cb1

shoyer mentioned this issue Aug 5, 2016

sum in pandas can concatenate strings #13916

Open

jreback added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Difficulty Intermediate labels Aug 5, 2016

jreback added the Bug label Aug 5, 2016

jreback added this to the 1.0 milestone Aug 5, 2016

jreback changed the title ~~Columns used in sum on 1 row dataframe depend on values instead of dtype~~ BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype Aug 5, 2016

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019

jbrockmendel added Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc. labels Sep 22, 2020

mroeschke mentioned this issue May 15, 2021

TST: Add tests for old issues #41482

Merged

10 tasks

mroeschke modified the milestones: Contributions Welcome, 1.3 May 15, 2021

jreback closed this as completed in #41482 May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype #13912

BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype #13912

jcrist commented Aug 4, 2016

jreback commented Aug 5, 2016

jreback commented Aug 5, 2016

mroeschke commented May 1, 2021

BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype #13912

BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype #13912

Comments

jcrist commented Aug 4, 2016

jreback commented Aug 5, 2016

jreback commented Aug 5, 2016

mroeschke commented May 1, 2021