You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a reminder, we're using this function to convert columns of the dataframe from Python objects into numpy/pandas dtypes where possible, which makes the dataframe faster to pickle and send over the wire.
We'll need to decide on what to do. It is possible that the performance reason for doing convert_objects() may no longer be as important as performance in other relevant components may have improved, though it is still a semantic change to be returning dataframes where the numpy arrays pulled out of them are of dtype object containing Python floats instead of being dtype float as expected.
It seems like the alternatives to convert_objects may require explicitly saying the type of each column, which would be super annoying. But I'll look into it and see if we can replicate the current behaviour using the alternatives.
The text was updated successfully, but these errors were encountered:
After simply removing the convert_objects call and looking at some dataframes, it looks like the dtype of columns are pretty much what you would expect - they are specific datatypes and not all of type object. So convert_objects is not doing much work - they are already floats and ints and datetimes when appropriate, and are only object when there is genuinely mixed data.
Behaviour is slightly imperfect since you can accidentally make a column a mixed dtype by saving an analysis result that is a different datatype than all other shots, and then changing and re-running analysis (or removing the shot) such that the column contains all the same datatype again, but the column will still remain dtype object as if the data were mixed.
The lyse update_row() method is already converting to a column to dtype object when it gets a datatype incompatible with the current datatype of a column. So some code could be added to check when the dtype of an element changes and check if it can convert back to specific datatypes. But this would involve looping over the whole dataframe or at least whole columns, and I'm hesitant to do it.
I think we should just remove the call to convert_objects and see how it goes.
Original report (archived issue) by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
pandas 0.25 has dropped
DataFrame.convert_obects()
, resulting in an exception from the server when getting the dataframe usinglyse.data()
.Discussion about the deprecation and removal here:
pandas-dev/pandas#11221
As a reminder, we're using this function to convert columns of the dataframe from Python objects into numpy/pandas dtypes where possible, which makes the dataframe faster to pickle and send over the wire.
We'll need to decide on what to do. It is possible that the performance reason for doing
convert_objects()
may no longer be as important as performance in other relevant components may have improved, though it is still a semantic change to be returning dataframes where the numpy arrays pulled out of them are of dtypeobject
containing Python floats instead of being dtypefloat
as expected.It seems like the alternatives to
convert_objects
may require explicitly saying the type of each column, which would be super annoying. But I'll look into it and see if we can replicate the current behaviour using the alternatives.The text was updated successfully, but these errors were encountered: