Move size information out of Numba type #1343

agoose77 · 2022-03-04T17:05:16Z

Description of new feature

The existing Numba support for Awkward includes known size information in the type. This differs from the way that NumPy arrays are handled, where the shape information is separate to the type. The consequence of this is that whenever the size of an array changes, so too does the type, and thus Numba recompiles the function for the input.

>>> numba.typeof(ak.from_numpy(np.identity(3)))
ak.ArrayView(ak.RegularArrayType(ak.NumpyArrayType(array(float64, 1d, A), none, {}), 3, none, {}), None, ())

>>> numba.typeof(np.identity(3))
array(float64, 2d, C)

To be clear, it is the RegularArray size that denotes the length of the sublists that is included.

The text was updated successfully, but these errors were encountered:

jpivarski · 2022-04-15T19:02:05Z

I know we talked about this somewhere (Gitter?). In light of all the things that have to get done, I'm going to mark this as "wontfix" because baking the size information into the type is what a RegularArray is. That makes it not quite the same as a NumPy array in a precompiled context.

It does have consequences for performance: different regular sizes force Numba to recompile functions, as you point out, but it also injects a constant into loops for LLVM to optimize, whereas Numba's NumPy handling can't perform that optimization. So you gain on runtime and lose in compilation time. Interestingly, the Numba team was discussing the possibility of going the other way with their NumPy handling (as an option, not a requirement).

What we're talking about here should be considered a new type.

ListType has list lengths that vary from item to item,
this would have list lengths that are all the same in an array, but not part of the Form or Type and not used to compile code in Numba, and
RegularArray, which has list lengths that are all the same in an array and it is part of the Form/Type and is used to compile.

That middle case would have to store its size somewhere in an array buffer (of length 1?) because it can't be in the Form/Type object. Conflating the last two bullet points, i.e. making a RegularType in which the size is in the Form/Type but is compiled in Numba with runtime size, rather than compile-time size, would be possible but it feels to me like mixing concepts.

Adding a new node type is a big project, and it would differ from an existing node in a subtle way. And on top of that, we just have so much to do.

agoose77 · 2022-04-16T11:31:53Z

Definitely not proposing a new node type, particularly not when it would only be required to support Numba!

making a RegularType in which the size is in the Form/Type but is compiled in Numba with runtime size, rather than compile-time size, would be possible but it feels to me like mixing concepts

This was what I was thinking of. However, it might be that we just accept that users need to be aware of these cases. Where users have layouts that change in regular size, they would need to convert the layout to jagged lists ahead-of-time.

the Numba team was discussing the possibility of going the other way with their NumPy handling

Huh, I hadn't seen that. Interesting. The motivation for filing this issue was that we differ from Numba, so when a user has a ak.from_numpy array, it behaves differently to the pure NumPy array, (at least, I assume that we don't have a special case for the Regular(Numpy) case in our Numba interface. Maybe that's acceptable, with the right docs.

The other issues that we have to work on certainly take precedence over this one. Maybe we'll revisit it in the future if we find it is a pressing concern.

agoose77 added feature New feature or request performance Works, but not fast enough or uses too much memory labels Mar 4, 2022

jpivarski closed this as completed Apr 15, 2022

jpivarski added the wontfix This will not be worked on label Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move size information out of Numba type #1343

Move size information out of Numba type #1343

agoose77 commented Mar 4, 2022 •

edited

Loading

jpivarski commented Apr 15, 2022

agoose77 commented Apr 16, 2022

Move size information out of Numba type #1343

Move size information out of Numba type #1343

Comments

agoose77 commented Mar 4, 2022 • edited Loading

Description of new feature

jpivarski commented Apr 15, 2022

agoose77 commented Apr 16, 2022

agoose77 commented Mar 4, 2022 •

edited

Loading