Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace float64 with Float64Dtype in pandas #264

Open
gsheni opened this issue Oct 14, 2020 · 7 comments
Open

Replace float64 with Float64Dtype in pandas #264

gsheni opened this issue Oct 14, 2020 · 7 comments
Labels
enhancement Improvement to an existing feature

Comments

@gsheni
Copy link
Contributor

gsheni commented Oct 14, 2020

@gsheni gsheni added the enhancement Improvement to an existing feature label Oct 14, 2020
@gsheni
Copy link
Contributor Author

gsheni commented Feb 2, 2021

EvalML is currently adding support for pandas 1.2.0

@gsheni
Copy link
Contributor Author

gsheni commented Mar 17, 2021

@gsheni
Copy link
Contributor Author

gsheni commented Apr 15, 2021

Blocked until Koalas fixes the 1.2.0 restriction: databricks/koalas#2137

@gsheni
Copy link
Contributor Author

gsheni commented Apr 29, 2021

One thought is that instead of changing the underlying dtype for Double, we could add a Logical Type DoubleNullable with a dtype of Float64Dtype. We would keep Double and float64 as is, and make it the default inferred type. So a user would have to explicitly set DoubleNullable for a column.

This way we avoid causing downstream problems with the new Float64Dtype.

Thoughts @freddyaboulton @thehomebrewnerd @tamargrey ?

@thehomebrewnerd
Copy link
Contributor

I thought about this a bit as well. My main hesitation is that I'm not sure the Double and DoubleNullable names work quite as cleanly as Integer and IntegerNullable as both double logical types would be able to accept null values. Not sure what would be better at the moment though.

I also wonder if we should do this now or just wait a while longer until the Float64Dtype is no longer problematic? Maybe once the downstream problems are resolved (assuming we get to that point) we could just make one update to change Integer, Boolean and Double all use the new dtypes and drop the old non-nullable versions? It would be a bit strange to leave the double version out in the short term though since we have support for the others.

I'm rambling a bit...which means I'm undecided and don't have a strong opinion either way...

@tamargrey
Copy link
Contributor

I think I'd vote for not having Float64Dtype at all over having a Double and DoubleNullable. Maybe there's another name that better describes the relationship between the two potential Logical Types?

Double and DoubleNewDtype is definitely a different naming convention, though, and I don't think it's the end of the world to not have any logical type that uses the new dtype.

@gsheni
Copy link
Contributor Author

gsheni commented Apr 29, 2021

Alright, let's icebox this for now and close the Float64Dtype MR. It may cause un-necessary problems downstream, and we can re-visit once downstream libraries update to support this new Dtype.

We can also revisit if we find a compelling use-case for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants