BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int #34665

dmlogv · 2020-06-09T13:58:50Z

Problem description

DataFrame.to_dict() method do not cast Nullable Int types (Int*Dtype) into Python int type. Instead, it unwrapping into numpy.int* types.

Possibly related to: #27616, #25969, #21256

Expected Output

Native Python int type.

Reproduction

Make some data:

import pandas as pd

df = pd.DataFrame({'id': range(5),
                   'coeff': [i * 0.1 for i in range(5)],
                   'is_hot': [True] * 2 + [False] * 3,
                   'value': [1, None, 2, 3, None]})
df

	id	coeff	is_hot	value
0	0	0.0	True	1.0
1	1	0.1	True	NaN
2	2	0.2	False	2.0
3	3	0.3	False	3.0
4	4	0.4	False	NaN

df.dtypes

id          int64
coeff     float64
is_hot       bool
value     float64
dtype: object

value have to be a nullable int:

df['value'] = df['value'].astype(pd.Int64Dtype())
df.dtypes

id          int64
coeff     float64
is_hot       bool
value       Int64
dtype: object

Looks great. But convert a dataframe to dict:

dicts = df.to_dict(orient='records')
dicts

[{'id': 0, 'coeff': 0.0, 'is_hot': True, 'value': 1},
 {'id': 1, 'coeff': 0.1, 'is_hot': True, 'value': nan},
 {'id': 2, 'coeff': 0.2, 'is_hot': False, 'value': 2},
 {'id': 3, 'coeff': 0.30000000000000004, 'is_hot': False, 'value': 3},
 {'id': 4, 'coeff': 0.4, 'is_hot': False, 'value': nan}]

pd.DataFrame(
    [[type(v) for k, v in row.items()] for row in dicts], 
    columns=dicts[0].keys())

	id	coeff	is_hot	value
0	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'numpy.int64'>
1	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'float'>
2	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'numpy.int64'>
3	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'numpy.int64'>
4	<class 'int'>	<class 'float'>	<class 'bool'>	<class 'float'>

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.7.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.4.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : en_US.UTF-8

pandas           : 1.0.4
numpy            : 1.18.4
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.15.0
pandas_datareader: None
bs4              : 4.8.1
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.2.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-06-09T14:47:16Z

@dm-logv thanks for the report. This is probably related to #29738

arw2019 · 2020-10-24T15:49:53Z

Minimal reproducer:

In [3]: import pandas as pd 
   ...:  
   ...: df = pd.DataFrame({'A': [1, None, 2, 3, None]}) 
   ...: df['A'] = df['A'].astype('Int64') 
   ...: dicts = df.to_dict(orient="records") 
   ...: pd.DataFrame( 
   ...:     [[type(v) for k, v in row.items()] for row in dicts],  
   ...:     columns=dicts[0].keys())                                                                                                                                                                               
Out[3]: 
                                       A
0                  <class 'numpy.int64'>
1  <class 'pandas._libs.missing.NAType'>
2                  <class 'numpy.int64'>
3                  <class 'numpy.int64'>
4  <class 'pandas._libs.missing.NAType'>

arw2019 · 2020-10-24T16:43:05Z

Even smaller:

In [2]: import pandas as pd 
   ...:  
   ...: df = pd.DataFrame({'A': [1, None]}) 
   ...: df['A'] = df['A'].astype('Int64') 
   ...: records_as_dicts = df.to_dict(orient="records") 
   ...: pd.DataFrame([[type(v) for v in row.values()] for row in records_as_dicts], columns=records_as_dicts[0].keys())                                                                                            
Out[2]: 
                                       A
0                  <class 'numpy.int64'>
1  <class 'pandas._libs.missing.NAType'>

VikingPathak · 2021-08-13T10:36:09Z

I am using pd.read_sql() which returned a dataframe and then applying .to_dict() gave me a dictionary with value True having type numpy.bool_.

Could not find a better approach so I used to_json() instead of to_dict() and enclosed it in json.loads(). All the numpy types were converted to python object.

json.loads(dataframe.iloc[0, :].to_json())

dmlogv added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 9, 2020

jorisvandenbossche added NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 9, 2020

arw2019 mentioned this issue Oct 24, 2020

ENH/BUG: implement __iter__ for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377

Closed

6 tasks

jreback added this to the 1.2 milestone Oct 24, 2020

jreback modified the milestones: 1.2, Contributions Welcome Nov 19, 2020

RogerThomas mentioned this issue Apr 12, 2022

BUG: to_dict("index") and to_dict("list") don't coerce to native types #46751

Closed

3 tasks

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

phofl mentioned this issue Dec 30, 2022

BUG: to_dict not converting masked dtype to native python types #50510

Closed

5 tasks

phofl mentioned this issue Jan 19, 2023

TST: Add test for to_dict converting masked to python types #50874

Merged

5 tasks

mroeschke closed this as completed in #50874 Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int #34665

BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int #34665

dmlogv commented Jun 9, 2020

jorisvandenbossche commented Jun 9, 2020

arw2019 commented Oct 24, 2020

arw2019 commented Oct 24, 2020

VikingPathak commented Aug 13, 2021

BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int #34665

BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int #34665

Comments

dmlogv commented Jun 9, 2020

Problem description

Expected Output

Reproduction

Output of pd.show_versions()

jorisvandenbossche commented Jun 9, 2020

arw2019 commented Oct 24, 2020

arw2019 commented Oct 24, 2020

VikingPathak commented Aug 13, 2021

Output of `pd.show_versions()`