Inconsistent types in output of series.to_dict() and DataFrame([series]).loc[0].to_dict() #13830

mikepqr · 2016-07-28T15:36:10Z

to_dict() extracts the elements from a Series as different types depending on whether or not the series was accessed by, e.g. loc[0] on a DataFrame or not:

>>> s = pd.Series({'a': None, 'b': 99, 'c': 'hello'})
>>> df = pd.DataFrame([s])
>>> [type(v) for k, v in s.to_dict().items()]
[NoneType, str, int]
>>> [type(v) for k, v in df.loc[0].to_dict().items()]
[NoneType, str, numpy.int64]

Note that the number is a base int when extracted with s.to_dict(), but it's a numpy.int64 when extracted from df.loc[0]. The same inconsistency applies to tolist().

Is this inconsistency a feature or a bug? And if it's a feature, does anyone know how do I reliably extract the values of a row from a DataFrame in base python types, using either to_dict() or tolist()?

output of `pd.show_versions()`

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-56-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 1.5.6
setuptools: 12.2
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2016-07-28T16:21:41Z

Two things going on here.

the int vs. int64 is present after constructing the data frame. nothing to do with to_dict.

In [21]: type(s.iloc[1])
Out[21]: int

In [22]: type(df.iloc[0, 1])
Out[22]: numpy.int64

xref: #9108

mikepqr · 2016-07-28T16:34:47Z

My use case was the same as #9108: I wanted to assemble a somewhat complicated JSON object that contains things that aren't only in the DataFrame. pd.io.json.dumps on the dictionary works fine. So my problem is solved. Thanks!

Just out of interest then: is the inconsistency in the types being stored in s.iloc[1] and df.iloc[0, 1] correct behaviour?

TomAugspurger · 2016-07-28T16:40:16Z

Great to hear.

Just out of interest then: is the inconsistency in the types being stored in s.iloc[1] and df.iloc[0, 1] correct behaviour?

Yeah, I think so. Series has to have a single dtype, which must be object in this case since you have mixed types (not a good idea in general). That means we can optimize to a numpy dtype. When you go to a DataFrame each col can have it's own type, which will use NumPy if possible. A good comparison is to pd.Series([1, 2]), which does use numpy ints, even though you pass in python ints.

mikepqr changed the title ~~Inconsistent types in output of series.to_dict() and df.loc[0].to_dict()~~ Inconsistent types in output of series.to_dict() and DataFrame([s]).loc[0].to_dict() Jul 28, 2016

mikepqr changed the title ~~Inconsistent types in output of series.to_dict() and DataFrame([s]).loc[0].to_dict()~~ Inconsistent types in output of series.to_dict() and DataFrame([series]).loc[0].to_dict() Jul 28, 2016

TomAugspurger closed this as completed Jul 28, 2016

TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Usage Question labels Jul 28, 2016

maximz mentioned this issue Jul 26, 2019

to_dict() on a boolean series sometimes returns numpy types instead of Python types #27616

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent types in output of series.to_dict() and DataFrame([series]).loc[0].to_dict() #13830

Inconsistent types in output of series.to_dict() and DataFrame([series]).loc[0].to_dict() #13830

mikepqr commented Jul 28, 2016 •

edited

Loading

TomAugspurger commented Jul 28, 2016

mikepqr commented Jul 28, 2016

TomAugspurger commented Jul 28, 2016 •

edited

Loading

Inconsistent types in output of series.to_dict() and DataFrame([series]).loc[0].to_dict() #13830

Inconsistent types in output of series.to_dict() and DataFrame([series]).loc[0].to_dict() #13830

Comments

mikepqr commented Jul 28, 2016 • edited Loading

output of pd.show_versions()

TomAugspurger commented Jul 28, 2016

mikepqr commented Jul 28, 2016

TomAugspurger commented Jul 28, 2016 • edited Loading

mikepqr commented Jul 28, 2016 •

edited

Loading

output of `pd.show_versions()`

TomAugspurger commented Jul 28, 2016 •

edited

Loading