Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.to_dict returning numpy scalars in certain cases #23753

Closed
jorisvandenbossche opened this issue Nov 17, 2018 · 4 comments · Fixed by #23921
Closed

DataFrame.to_dict returning numpy scalars in certain cases #23753

jorisvandenbossche opened this issue Nov 17, 2018 · 4 comments · Fixed by #23921
Labels
Bug DataFrame DataFrame data structure Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 17, 2018

I think in general we try to return python scalars instead of numpy scalars in to_dict (similar as in tolist or iteration).

Eg:

In [27]: df = pd.DataFrame({'a': [1, 2], 'b': [.1, .2]})

In [28]: df.to_dict()
Out[28]: {'a': {0: 1, 1: 2}, 'b': {0: 0.1, 1: 0.2}}

In [29]: type(df.to_dict()['a'][0])
Out[29]: int

However, this is not consistent, and eg when using orient='records':

In [31]: df.to_dict(orient='records')
Out[31]: [{'a': 1.0, 'b': 0.10000000000000001}, {'a': 2.0, 'b': 0.20000000000000001}]

In [32]: type(df.to_dict(orient='records')[0]['a'])
Out[32]: numpy.float64

In this case, that is because of iterating over self.values in the 'records' implementation (which also means that if you have a string column, self.values will be object dtype, and you actually get python scalars)

There are a bunch of other issues related to iteration (eg #20791, #13468), but didn't see one specifically related to to_dict.

@jreback
Copy link
Contributor

jreback commented Nov 17, 2018

pretty sure this is a duplicate issue

@jorisvandenbossche
Copy link
Member Author

As I said, I searched for it but didn't see one directly. But if you find one, happy to close this as a duplicate.

For iteration there are other issues, but here for to_dict, it is not only due to iteration of pandas objects, but eg also numpy depending on the orient type, so I think it deserves its own issue.

@jorisvandenbossche
Copy link
Member Author

Not directly related to this issue, but: an option to convert missing values to None would also be nice for my use case. Although that might add quite some complexity to the implementation (and you can do it yourself relatively easy)

@bourbaki
Copy link
Contributor

bourbaki commented Nov 25, 2018

@jreback I am working on the issue. The source of it is usage of DataFrame.values property in the most of to_dict orientations. DataFrame.values gathers data from all columns and converts them to typed nd.array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug DataFrame DataFrame data structure Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants