BUG: NumericIndex should not support float16 dtype #49536

topper-123 · 2022-11-04T21:06:28Z

closes BUG: NumericIndex should not support float16 dtype #49535
Tests added and passed if fixing a bug or adding a new feature
[ x All code checks passed.
[] Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

The whatsnew does not need updating, because NumericIndex is purely internal.

Extracted from #49494

mroeschke · 2022-11-08T18:46:37Z

Will Index(..., dtype=np.float16) upcast to np.float32 in the future then too?

Makes sense from an indexing utility why float16 should be disallowed, but just as a container it could be?

In [7]: pd.Index([1], dtype=np.float16)
Out[7]: Float64Index([1.0], dtype='float64')  # Would this be Index(..., dtype=float32) in the future?

In [8]: pd.Series([1], dtype=np.float16)
Out[8]:
0    1.0
dtype: float16

cc @jbrockmendel if you have any opinions on consistency with the above example

jbrockmendel · 2022-11-08T19:02:28Z

Will Index(..., dtype=np.float16) upcast to np.float32 in the future then too?

I would much rather this raise than give you something unexpected.

More generally, it'd be nice if Index supported everything Series did

topper-123 · 2022-11-09T04:31:19Z

My reasoning for upcasting was that pandas has a history of being very permissive with inputs and doing a lot to take in everything. This is not a very strong opinion of mine, so I'm also ok with disallowing float16.

float16 is not available in khash, which was needed to make an index in the same style as the other numeric indexes, but yeah I would also preferred to not special case float16.

mroeschke · 2022-11-09T18:46:09Z

@topper-123 would it be possible upcast to float32 when needing to dispatching to khash but keep the float16 when storing values (this may be a naive question with my gaps in indexing code).

Agreed with Brock that generally we should be more explicit with disallowing float16 in Index (and I guess Series) or supporting float16 in Index & Series (maybe upcasting internally to float32 in Index when needed?)

topper-123 · 2022-11-12T08:27:56Z

Are we sure there won't be lossiness if we use float32 in the backend for float16 indexes?

I'm thinking that e.g. unions of float16 and int8 indexes then need to convert the int8->float32 float16->float32 and then convert the result back down to float16. Could that give wrong results in some circumstances? I'm also not a super fan of the special casing needed in the code and tests to make that happen, because float16 is rarely used.

Can we instead use @jbrockmendel suggestion to disallow float16 indexes? If someone later wants to have float16 index they casn open a PR at that time?

mroeschke · 2022-11-14T20:14:45Z

unions of float16 and int8 indexes then need to convert

I'm assuming that find_common_dtype (or the associated dtype resolution path) would just upcast to the appropriate type, but agreed that I may be underestimating edge cases where just-in-time promotions may cause issues.

I'd be okay raising a NotImplementedError for now to disallow float16

jbrockmendel · 2022-11-14T20:48:20Z

disallowing float16 would be OK. my preference would be to implement Float16Engine in index.pyx to cast to float32 before passing things to the hashtables.

topper-123 · 2022-11-15T13:19:31Z

I've updated to raise NotImplementedError in float16 cases.

mroeschke

So based on the changed test, looks like float16 was being tested and supported in some operations involving the index? IMO that signals that we should probably try to support if possible

jbrockmendel · 2022-11-17T03:36:37Z

cdef class Float16Engine:
    def __init__(self, values):
        values = values.astype(np.float32)
        super().__init__(values)

    def get_loc(self, key):
        if isinstance(key, np.float16):
            key = key.astype(np.float32)
        return super().get_loc(key)

    def get_indexer(self, other):
        other = other.astype(np.float32)
        return super().get_indexer(other)

topper-123 · 2022-11-17T06:50:24Z

Yeah could be. I got another way using 16-bit floats in 32-bit IndexEngines in #49560 file pandas/core/indexes/numeric.py, (lines 103 and 116), so that's not a blocker ATM.

We could discuss which is better, though I do like the simplicity of mine :-).

mroeschke · 2022-11-18T20:09:15Z

Yeah I think

    def _get_engine_target(self) -> ArrayLike:
        vals = self._values
        # pandas has no Float16Engine, so we use Float32Engine instead
        if vals.dtype == "float16":
            vals = vals.astype("float32")
        return vals

in #49560 looks reasonable (given the test passing)

topper-123 · 2022-12-07T16:50:07Z

I've looked into making Float16Engine as suggested by @jbrockmendel in order to support float16 indexes. It may not be practically possible (without a lot of work). The underlying issue is that cnumpy doesn't have a float16_t. Without a float16_t type, we can't give float16 arrays to the relevant c-level functions and will have make workarounds, which will probably be ugly and of marginal value (no one uses float16 arrays anyway).

For further discussion see this and this stackoverflow discussions. From the discussions it looks like that numpy internally converts float16 to float32 and then after the operations converts back to float16. I guess we could do that also, but that may be a project unto itself and outside of scope my current work (i.e. collecting numeric indexes in the base Index).

I'm thinking we should choose between:
1: make instantiating Index with a float16 dtype raise a NotImplementedError.
2: make instantiating Index with a float16 dtype convert to Index with a float32 dtype (maybe raise a warning also?).

It seems from the discussion above that the majority opinion is 1. Do you still hold that opinion given that numpy does not have a float16_t type?

jbrockmendel · 2022-12-07T23:13:50Z

i guess go forward with this and ill take a stab at implementing Float16Engine

jreback · 2022-12-07T23:41:12Z

umm we don't support float16 for nearly anything.

would be -1 on supporting
just raise

topper-123 · 2022-12-08T14:14:50Z

Something to consider is that calling any numpy ufunc on int8 arrays returns a float16 array e.g.

>>> import numpy as np
>>> arr = np.arange(3, dtype=npint8)
>>> np.exp(arr)
array([1.   , 2.719, 7.39 ], dtype=float16)

If we raise on float16 indexes, calling those ufunc functions on int8 indexes would also raise unless we guard against that in Index__array_ufunc__, e.g.

>>> import pandas as pd
>>> idx = pd.core.api.NumericIndex(arr)
>>> idx
NumericIndex([0, 1, 2], dtype='int8')
>>> np.exp(idx)  # what should this return?

The choices are (if we do not have a float16 index):

raise NotImplentedError
return NumericIndex(np.exp(arr), dtype='float32') by changing changing float 16 arrays to float32 arrays in Index.__array_ufunc__
return NumericIndex(np.exp(arr), dtype='float32') by letting NumericIndex(arr, dtype='float16') return a float32 index (not raising on float16 instantiation)?

Raising on a NumericIndex(..., dtype=float16) input means either choice 1 or 2. Option 1 would be the most stringent but would fail in some cases where users might expect it to not fail, while option 2 is a bit like option 3 but without allowing directly instantiating with dtype=float16 become float32 indexes.

topper-123 · 2022-12-10T23:40:59Z

Any comment, especially on the issue of ufuncs on int8 indexes?

jbrockmendel · 2022-12-12T18:29:44Z

Any comment, especially on the issue of ufuncs on int8 indexes?

We could cast float16 ufunc results to float32 before wrapping in an Index. I think we do something similar with FloatingArray.

I still think best-case is to support the same dtypes in Index that we do with Series. I've got a branch going that implements Float16Engine, have some test failures to work out.

topper-123 · 2022-12-12T22:17:47Z

We could cast float16 ufunc results to float32 before wrapping in an Index. I think we do something similar with FloatingArray.

That is what I've done in the newest version (option 2 above). If we could pull this in (and #50195) then we'd be ready to pull in #49560 also and I could proceed with removing Int64Index etc?

topper-123 · 2022-12-15T22:21:47Z

I just rebased, just in case, and the failure looks unrelated.

Could we merge this and then the work that @jbrockmendel does in #50218 could be rebased to include this PR? This one not being merged is blocking #49560, which is an important step in the work of making the base Index take numeric dtypes.

jbrockmendel · 2022-12-16T18:58:06Z

pandas/core/indexes/numeric.py

+        if dtype == np.float16:
+            # float16 not supported (no indexing engine)
+            dtype = np.dtype(np.float32)
+        if dtype == "float16":


why disallowing the string but not the type?

Yes, I'll make a new commit to fix this.

I’ve updated the PR.

topper-123 · 2022-12-21T07:17:55Z

The failures look unrelated to this PR.

mroeschke

I would be okay with disallowing float16 for now, but I think this needs a whatsnew because Index(..., dtype="float16") now raises an exception when prior this worked?

topper-123 · 2022-12-22T12:12:05Z

My intention was to write the whatsnew after getting the the code PRs committed, because it's all related. Would that be ok (I'll include the float16 issue in).

mroeschke · 2022-12-22T18:48:01Z

Sure

topper-123 · 2022-12-22T19:00:32Z

👍 Can we pull this PR in (after the CI gets working again)?

mroeschke

LGTM cc @jbrockmendel if you have any other comments

topper-123 · 2022-12-23T01:55:11Z

I’ve rebased so it would pass the CI. No other changes were made.

topper-123 · 2022-12-24T00:22:00Z

I'd like to merge this, is that ok?

mroeschke · 2022-12-27T18:34:16Z

Thanks @topper-123

topper-123 force-pushed the convert_float16_index_to_float32 branch from 06333da to 455b2a7 Compare November 5, 2022 07:45

topper-123 mentioned this pull request Nov 6, 2022

DEPR: Remove int64 uint64 float64 index part 1 #49560

Merged

mroeschke added the Index Related to the Index class or subclasses label Nov 8, 2022

mroeschke added the Dtype Conversions Unexpected or buggy dtype conversions label Nov 8, 2022

topper-123 force-pushed the convert_float16_index_to_float32 branch from 455b2a7 to f087051 Compare November 12, 2022 06:57

topper-123 force-pushed the convert_float16_index_to_float32 branch from 87f392a to fd0e959 Compare November 15, 2022 00:08

mroeschke reviewed Nov 17, 2022

View reviewed changes

topper-123 force-pushed the convert_float16_index_to_float32 branch 2 times, most recently from 30dcd1d to aa8b577 Compare December 10, 2022 18:20

jbrockmendel mentioned this pull request Dec 12, 2022

ENH: implement Float16Engine #50218

Closed

topper-123 added this to the 2.0 milestone Dec 13, 2022

topper-123 force-pushed the convert_float16_index_to_float32 branch from aa8b577 to f6edfac Compare December 15, 2022 17:26

jbrockmendel reviewed Dec 16, 2022

View reviewed changes

topper-123 force-pushed the convert_float16_index_to_float32 branch from f6edfac to 03c5968 Compare December 20, 2022 22:30

mroeschke reviewed Dec 21, 2022

View reviewed changes

mroeschke approved these changes Dec 22, 2022

View reviewed changes

Terji Petersen and others added 9 commits December 23, 2022 00:14

BUG: NumericIndex should not support float16 dtype

2cc08ab

make NumericIndex fail with float16 dtype

f73078d

make NumericIndex fail with float16 dtype, II

d41cf35

fix failures

d33c7d6

NotImplementedError

91a42da

NotImplementedError II

05b0168

fail on float16, but allow np.exp(int8_arrays)

7c108f2

fail on float16, but allow np.exp(int8_arrays) II

bba522a

fix NumericIndex_ensure_dtype

785df81

topper-123 force-pushed the convert_float16_index_to_float32 branch from 03c5968 to 785df81 Compare December 23, 2022 00:16

mroeschke merged commit 38b4e96 into pandas-dev:main Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: NumericIndex should not support float16 dtype #49536

BUG: NumericIndex should not support float16 dtype #49536

topper-123 commented Nov 4, 2022

mroeschke commented Nov 8, 2022

jbrockmendel commented Nov 8, 2022

topper-123 commented Nov 9, 2022

mroeschke commented Nov 9, 2022

topper-123 commented Nov 12, 2022

mroeschke commented Nov 14, 2022

jbrockmendel commented Nov 14, 2022

topper-123 commented Nov 15, 2022

mroeschke left a comment

jbrockmendel commented Nov 17, 2022

topper-123 commented Nov 17, 2022

mroeschke commented Nov 18, 2022

topper-123 commented Dec 7, 2022

jbrockmendel commented Dec 7, 2022

jreback commented Dec 7, 2022

topper-123 commented Dec 8, 2022 •

edited

Loading

topper-123 commented Dec 10, 2022

jbrockmendel commented Dec 12, 2022

topper-123 commented Dec 12, 2022

topper-123 commented Dec 15, 2022

jbrockmendel Dec 16, 2022

topper-123 Dec 20, 2022

topper-123 Dec 21, 2022

topper-123 commented Dec 21, 2022

mroeschke left a comment

topper-123 commented Dec 22, 2022

mroeschke commented Dec 22, 2022

topper-123 commented Dec 22, 2022

mroeschke left a comment

topper-123 commented Dec 23, 2022

topper-123 commented Dec 24, 2022

mroeschke commented Dec 27, 2022

BUG: NumericIndex should not support float16 dtype #49536

BUG: NumericIndex should not support float16 dtype #49536

Conversation

topper-123 commented Nov 4, 2022

mroeschke commented Nov 8, 2022

jbrockmendel commented Nov 8, 2022

topper-123 commented Nov 9, 2022

mroeschke commented Nov 9, 2022

topper-123 commented Nov 12, 2022

mroeschke commented Nov 14, 2022

jbrockmendel commented Nov 14, 2022

topper-123 commented Nov 15, 2022

mroeschke left a comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 17, 2022

topper-123 commented Nov 17, 2022

mroeschke commented Nov 18, 2022

topper-123 commented Dec 7, 2022

jbrockmendel commented Dec 7, 2022

jreback commented Dec 7, 2022

topper-123 commented Dec 8, 2022 • edited Loading

topper-123 commented Dec 10, 2022

jbrockmendel commented Dec 12, 2022

topper-123 commented Dec 12, 2022

topper-123 commented Dec 15, 2022

jbrockmendel Dec 16, 2022

Choose a reason for hiding this comment

topper-123 Dec 20, 2022

Choose a reason for hiding this comment

topper-123 Dec 21, 2022

Choose a reason for hiding this comment

topper-123 commented Dec 21, 2022

mroeschke left a comment

Choose a reason for hiding this comment

topper-123 commented Dec 22, 2022

mroeschke commented Dec 22, 2022

topper-123 commented Dec 22, 2022

mroeschke left a comment

Choose a reason for hiding this comment

topper-123 commented Dec 23, 2022

topper-123 commented Dec 24, 2022

mroeschke commented Dec 27, 2022

topper-123 commented Dec 8, 2022 •

edited

Loading