Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type conversions are skipped in 'to_dict' on single column dataframes #21256

Open
hodossy opened this issue May 30, 2018 · 4 comments · Fixed by #37571
Open

Type conversions are skipped in 'to_dict' on single column dataframes #21256

hodossy opened this issue May 30, 2018 · 4 comments · Fixed by #37571
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action

Comments

@hodossy
Copy link

hodossy commented May 30, 2018

Code to reproduce the error:

import pandas as pd
from datetime import datetime

dfs = {
    'full_df': pd.DataFrame([
        {'int': 1, 'date': datetime.now(), 'str': 'foo', 'float': 1.0, 'bool': True},
    ]),
    'int_df': pd.DataFrame([
        {'int': 1},
    ]),
    'date_df': pd.DataFrame([
        {'date': datetime.now()},
    ]),
    'str_df': pd.DataFrame([
        {'str': 'foo'},
    ]),
    'float_df': pd.DataFrame([
        {'float': 1.0},
    ]),
    'bool_df': pd.DataFrame([
        {'bool': True},
    ])
}

for name, frame in dfs.items():
    print('Types in ' + name)
    for k, v in frame.to_dict('records')[0].items():
        print(type(v))

Output:

Types in full_df
<class 'bool'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'float'>
<class 'int'>
<class 'str'>
Types in int_df
<class 'numpy.int64'>
Types in date_df
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Types in str_df
<class 'str'>
Types in float_df
<class 'numpy.float64'>
Types in bool_df
<class 'numpy.bool_'>

Problem description

One would expect that the to_dict() function returns python native types, or at least does the same to the same type of columns, however it behaves differently as shown above. It seems that type conversion is not invoked when a single column is present in the dataframe.

Expected Output

Python native types where it is possible for int, float, bool and str types, and if possible, a python datetime object instead of pandas.Timestamp

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0
pytest: None
pip: 9.0.1
setuptools: 38.2.4
Cython: None
numpy: 1.14.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: 1.7.4
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.2
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@hodossy
Copy link
Author

hodossy commented Dec 6, 2018

I have a temporary solution until is is fixed:

class NativeDict(dict):
    """
        Helper class to ensure that only native types are in the dicts produced by
        :func:`to_dict() <pandas.DataFrame.to_dict>`

        .. note::

            Needed until `#21256 <https://github.com/pandas-dev/pandas/issues/21256>`_ is resolved.
    """
    def __init__(self, *args, **kwargs):
        super().__init__(((k, self.convert_if_needed(v)) for row in args for k, v in row), **kwargs)

    @staticmethod
    def convert_if_needed(value):
        """
            Converts `value` to native python type.

            .. warning::

                Only :class:`Timestamp <pandas.Timestamp>` and numpy :class:`dtypes <numpy.dtype>` are converted.
        """
        if pd.isnull(value):
            return None
        if isinstance(value, pd.Timestamp):
            return value.to_pydatetime()
        if hasattr(value, 'dtype'):
            mapper = {'i': int, 'u': int, 'f': float}
            _type = mapper.get(value.dtype.kind, lambda x: x)
            return _type(value)
        return value

This also replaces NaN and NaT objects with native python None. Please note that it only intended use is to convert into, I have not tested elsewhere. It can be used like so:

df.to_dict(orient='records', into=NativeDict)

@arw2019
Copy link
Member

arw2019 commented Oct 24, 2020

This is fixed on 1.2 master. Running the OP:


In [3]: import pandas as pd 
   ...: from datetime import datetime 
   ...:  
   ...: dfs = { 
   ...:     'full_df': pd.DataFrame([ 
   ...:         {'int': 1, 'date': datetime.now(), 'str': 'foo', 'float': 1.0, 'bool': True}, 
   ...:     ]), 
   ...:     'int_df': pd.DataFrame([ 
   ...:         {'int': 1}, 
   ...:     ]), 
   ...:     'date_df': pd.DataFrame([ 
   ...:         {'date': datetime.now()}, 
   ...:     ]), 
   ...:     'str_df': pd.DataFrame([ 
   ...:         {'str': 'foo'}, 
   ...:     ]), 
   ...:     'float_df': pd.DataFrame([ 
   ...:         {'float': 1.0}, 
   ...:     ]), 
   ...:     'bool_df': pd.DataFrame([ 
   ...:         {'bool': True}, 
   ...:     ]) 
   ...: } 
   ...:  
   ...: for name, frame in dfs.items(): 
   ...:     print('Types in ' + name) 
   ...:     for k, v in frame.to_dict('records')[0].items(): 
   ...:         print(type(v)) 
   ...:                                                                                                 
Types in full_df
<class 'int'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'str'>
<class 'float'>
<class 'bool'>
Types in int_df
<class 'int'>
Types in date_df
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Types in str_df
<class 'str'>
Types in float_df
<class 'float'>
Types in bool_df
<class 'bool'>

@hodossy
Copy link
Author

hodossy commented Nov 6, 2020

Hello! Thanks for fixing the integers, but it seems that date types are still using the internal type. Would it be possible to convert them to native type as well?

@arw2019
Copy link
Member

arw2019 commented Feb 12, 2021

Do we want to reopen this?

xref #37648 (comment) I think we're not gonna act here but it does keep coming up

@arw2019 arw2019 reopened this Feb 12, 2021
@arw2019 arw2019 added the Needs Discussion Requires discussion from core team before further action label Feb 12, 2021
@arw2019 arw2019 added this to the No action milestone Feb 12, 2021
@mroeschke mroeschke added the Bug label Jun 20, 2021
@mroeschke mroeschke removed this from the No action milestone Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants