Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.convert_objects removed from pandas #52

Open
philipstarkey opened this issue Aug 14, 2019 · 2 comments
Open

df.convert_objects removed from pandas #52

philipstarkey opened this issue Aug 14, 2019 · 2 comments
Labels
bug Something isn't working critical

Comments

@philipstarkey
Copy link
Contributor

Original report (archived issue) by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


pandas 0.25 has dropped DataFrame.convert_obects(), resulting in an exception from the server when getting the dataframe using lyse.data().

AttributeError: 'DataFrame' object has no attribute 'convert objects'

Discussion about the deprecation and removal here:

pandas-dev/pandas#11221

As a reminder, we're using this function to convert columns of the dataframe from Python objects into numpy/pandas dtypes where possible, which makes the dataframe faster to pickle and send over the wire.

We'll need to decide on what to do. It is possible that the performance reason for doing convert_objects() may no longer be as important as performance in other relevant components may have improved, though it is still a semantic change to be returning dataframes where the numpy arrays pulled out of them are of dtype object containing Python floats instead of being dtype float as expected.

It seems like the alternatives to convert_objects may require explicitly saying the type of each column, which would be super annoying. But I'll look into it and see if we can replicate the current behaviour using the alternatives.

@philipstarkey
Copy link
Contributor Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


After simply removing the convert_objects call and looking at some dataframes, it looks like the dtype of columns are pretty much what you would expect - they are specific datatypes and not all of type object. So convert_objects is not doing much work - they are already floats and ints and datetimes when appropriate, and are only object when there is genuinely mixed data.

Behaviour is slightly imperfect since you can accidentally make a column a mixed dtype by saving an analysis result that is a different datatype than all other shots, and then changing and re-running analysis (or removing the shot) such that the column contains all the same datatype again, but the column will still remain dtype object as if the data were mixed.

The lyse update_row() method is already converting to a column to dtype object when it gets a datatype incompatible with the current datatype of a column. So some code could be added to check when the dtype of an element changes and check if it can convert back to specific datatypes. But this would involve looping over the whole dataframe or at least whole columns, and I'm hesitant to do it.

I think we should just remove the call to convert_objects and see how it goes.

See PR #70

@philipstarkey
Copy link
Contributor Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Ah, actually looks like there is a better way, infer_objects:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.infer_objects.html

This is not deprecated and does what we want. Pull request updated.

@philipstarkey philipstarkey added critical bug Something isn't working labels Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working critical
Projects
None yet
Development

No branches or pull requests

1 participant