Replace float64 with Float64Dtype in pandas #264

gsheni · 2020-10-14T22:29:11Z

pandas has merged in the new nullable FloatDtype
- ENH: nullable Float32/64 ExtensionArray pandas-dev/pandas#34307
It will be in the 1.2.0 release
- https://github.com/pandas-dev/pandas/blob/master/doc/source/whatsnew/v1.2.0.rst#experimental-nullable-data-types-for-float-data
Once it is released, Woodwork can take advantage of it, and use it. This will further us towards having 1 representation of NaN in DataTable

gsheni · 2021-02-02T16:44:35Z

EvalML is currently adding support for pandas 1.2.0

gsheni · 2021-03-17T19:49:27Z

EvalML now supports pandas 1.2.0:
alteryx/evalml@9576d5d#diff-e7031ce8aee6d7dc175631195661f5f893bfa3614e5f63ec93c15d2d59235667L2

gsheni · 2021-04-15T20:45:12Z

Blocked until Koalas fixes the 1.2.0 restriction: databricks/koalas#2137

gsheni · 2021-04-29T19:33:48Z

One thought is that instead of changing the underlying dtype for Double, we could add a Logical Type DoubleNullable with a dtype of Float64Dtype. We would keep Double and float64 as is, and make it the default inferred type. So a user would have to explicitly set DoubleNullable for a column.

This way we avoid causing downstream problems with the new Float64Dtype.

Thoughts @freddyaboulton @thehomebrewnerd @tamargrey ?

thehomebrewnerd · 2021-04-29T19:48:37Z

I thought about this a bit as well. My main hesitation is that I'm not sure the Double and DoubleNullable names work quite as cleanly as Integer and IntegerNullable as both double logical types would be able to accept null values. Not sure what would be better at the moment though.

I also wonder if we should do this now or just wait a while longer until the Float64Dtype is no longer problematic? Maybe once the downstream problems are resolved (assuming we get to that point) we could just make one update to change Integer, Boolean and Double all use the new dtypes and drop the old non-nullable versions? It would be a bit strange to leave the double version out in the short term though since we have support for the others.

I'm rambling a bit...which means I'm undecided and don't have a strong opinion either way...

tamargrey · 2021-04-29T21:41:45Z

I think I'd vote for not having Float64Dtype at all over having a Double and DoubleNullable. Maybe there's another name that better describes the relationship between the two potential Logical Types?

Double and DoubleNewDtype is definitely a different naming convention, though, and I don't think it's the end of the world to not have any logical type that uses the new dtype.

gsheni · 2021-04-29T21:55:39Z

Alright, let's icebox this for now and close the Float64Dtype MR. It may cause un-necessary problems downstream, and we can re-visit once downstream libraries update to support this new Dtype.

We can also revisit if we find a compelling use-case for it.

gsheni added the enhancement Improvement to an existing feature label Oct 14, 2020

tamargrey self-assigned this Mar 31, 2021

This was referenced Mar 31, 2021

Use nullable Float64 pandas dtype for Double #755

Closed

Drop Python 3.6 Support #530

Closed

gsheni unassigned tamargrey Jun 10, 2021

dvreed77 mentioned this issue May 9, 2022

CumMean and CumSum can fail on all null columns alteryx/featuretools#1682

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace float64 with Float64Dtype in pandas #264

Replace float64 with Float64Dtype in pandas #264

gsheni commented Oct 14, 2020

gsheni commented Feb 2, 2021

gsheni commented Mar 17, 2021

gsheni commented Apr 15, 2021

gsheni commented Apr 29, 2021 •

edited

Loading

thehomebrewnerd commented Apr 29, 2021

tamargrey commented Apr 29, 2021

gsheni commented Apr 29, 2021

Replace float64 with Float64Dtype in pandas #264

Replace float64 with Float64Dtype in pandas #264

Comments

gsheni commented Oct 14, 2020

gsheni commented Feb 2, 2021

gsheni commented Mar 17, 2021

gsheni commented Apr 15, 2021

gsheni commented Apr 29, 2021 • edited Loading

thehomebrewnerd commented Apr 29, 2021

tamargrey commented Apr 29, 2021

gsheni commented Apr 29, 2021

gsheni commented Apr 29, 2021 •

edited

Loading