-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: NumericIndex should not support float16 dtype #49536
BUG: NumericIndex should not support float16 dtype #49536
Conversation
06333da
to
455b2a7
Compare
Will Makes sense from an indexing utility why float16 should be disallowed, but just as a container it could be?
cc @jbrockmendel if you have any opinions on consistency with the above example |
I would much rather this raise than give you something unexpected. More generally, it'd be nice if Index supported everything Series did |
My reasoning for upcasting was that pandas has a history of being very permissive with inputs and doing a lot to take in everything. This is not a very strong opinion of mine, so I'm also ok with disallowing
|
@topper-123 would it be possible upcast to float32 when needing to dispatching to khash but keep the float16 when storing values (this may be a naive question with my gaps in indexing code). Agreed with Brock that generally we should be more explicit with disallowing float16 in Index (and I guess Series) or supporting float16 in Index & Series (maybe upcasting internally to float32 in Index when needed?) |
455b2a7
to
f087051
Compare
Are we sure there won't be lossiness if we use float32 in the backend for float16 indexes? I'm thinking that e.g. unions of float16 and int8 indexes then need to convert the int8->float32 float16->float32 and then convert the result back down to float16. Could that give wrong results in some circumstances? I'm also not a super fan of the special casing needed in the code and tests to make that happen, because float16 is rarely used. Can we instead use @jbrockmendel suggestion to disallow float16 indexes? If someone later wants to have float16 index they casn open a PR at that time? |
I'm assuming that I'd be okay raising a |
disallowing float16 would be OK. my preference would be to implement Float16Engine in index.pyx to cast to float32 before passing things to the hashtables. |
87f392a
to
fd0e959
Compare
I've updated to raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So based on the changed test, looks like float16 was being tested and supported in some operations involving the index? IMO that signals that we should probably try to support if possible
|
Yeah could be. I got another way using 16-bit floats in 32-bit IndexEngines in #49560 file pandas/core/indexes/numeric.py, (lines 103 and 116), so that's not a blocker ATM. We could discuss which is better, though I do like the simplicity of mine :-). |
Yeah I think
in #49560 looks reasonable (given the test passing) |
I've looked into making For further discussion see this and this stackoverflow discussions. From the discussions it looks like that numpy internally converts float16 to float32 and then after the operations converts back to float16. I guess we could do that also, but that may be a project unto itself and outside of scope my current work (i.e. collecting numeric indexes in the base I'm thinking we should choose between: It seems from the discussion above that the majority opinion is 1. Do you still hold that opinion given that numpy does not have a |
i guess go forward with this and ill take a stab at implementing Float16Engine |
umm we don't support float16 for nearly anything. would be -1 on supporting |
Something to consider is that calling any numpy ufunc on int8 arrays returns a float16 array e.g. >>> import numpy as np
>>> arr = np.arange(3, dtype=npint8)
>>> np.exp(arr)
array([1. , 2.719, 7.39 ], dtype=float16) If we raise on float16 indexes, calling those ufunc functions on int8 indexes would also raise unless we guard against that in >>> import pandas as pd
>>> idx = pd.core.api.NumericIndex(arr)
>>> idx
NumericIndex([0, 1, 2], dtype='int8')
>>> np.exp(idx) # what should this return? The choices are (if we do not have a float16 index):
Raising on a |
30dcd1d
to
aa8b577
Compare
Any comment, especially on the issue of ufuncs on int8 indexes? |
We could cast float16 ufunc results to float32 before wrapping in an Index. I think we do something similar with FloatingArray. I still think best-case is to support the same dtypes in Index that we do with Series. I've got a branch going that implements Float16Engine, have some test failures to work out. |
That is what I've done in the newest version (option 2 above). If we could pull this in (and #50195) then we'd be ready to pull in #49560 also and I could proceed with removing Int64Index etc? |
aa8b577
to
f6edfac
Compare
I just rebased, just in case, and the failure looks unrelated. Could we merge this and then the work that @jbrockmendel does in #50218 could be rebased to include this PR? This one not being merged is blocking #49560, which is an important step in the work of making the base |
pandas/core/indexes/numeric.py
Outdated
if dtype == np.float16: | ||
# float16 not supported (no indexing engine) | ||
dtype = np.dtype(np.float32) | ||
if dtype == "float16": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why disallowing the string but not the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll make a new commit to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve updated the PR.
f6edfac
to
03c5968
Compare
The failures look unrelated to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be okay with disallowing float16 for now, but I think this needs a whatsnew because Index(..., dtype="float16")
now raises an exception when prior this worked?
My intention was to write the whatsnew after getting the the code PRs committed, because it's all related. Would that be ok (I'll include the float16 issue in). |
Sure |
👍 Can we pull this PR in (after the CI gets working again)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cc @jbrockmendel if you have any other comments
03c5968
to
785df81
Compare
I’ve rebased so it would pass the CI. No other changes were made. |
I'd like to merge this, is that ok? |
Thanks @topper-123 |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.The whatsnew does not need updating, because
NumericIndex
is purely internal.Extracted from #49494