Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move size information out of Numba type #1343

Closed
agoose77 opened this issue Mar 4, 2022 · 2 comments
Closed

Move size information out of Numba type #1343

agoose77 opened this issue Mar 4, 2022 · 2 comments
Labels
feature New feature or request performance Works, but not fast enough or uses too much memory wontfix This will not be worked on

Comments

@agoose77
Copy link
Collaborator

agoose77 commented Mar 4, 2022

Description of new feature

The existing Numba support for Awkward includes known size information in the type. This differs from the way that NumPy arrays are handled, where the shape information is separate to the type. The consequence of this is that whenever the size of an array changes, so too does the type, and thus Numba recompiles the function for the input.

>>> numba.typeof(ak.from_numpy(np.identity(3)))
ak.ArrayView(ak.RegularArrayType(ak.NumpyArrayType(array(float64, 1d, A), none, {}), 3, none, {}), None, ())
>>> numba.typeof(np.identity(3))
array(float64, 2d, C)

To be clear, it is the RegularArray size that denotes the length of the sublists that is included.

@agoose77 agoose77 added feature New feature or request performance Works, but not fast enough or uses too much memory labels Mar 4, 2022
@jpivarski
Copy link
Member

I know we talked about this somewhere (Gitter?). In light of all the things that have to get done, I'm going to mark this as "wontfix" because baking the size information into the type is what a RegularArray is. That makes it not quite the same as a NumPy array in a precompiled context.

It does have consequences for performance: different regular sizes force Numba to recompile functions, as you point out, but it also injects a constant into loops for LLVM to optimize, whereas Numba's NumPy handling can't perform that optimization. So you gain on runtime and lose in compilation time. Interestingly, the Numba team was discussing the possibility of going the other way with their NumPy handling (as an option, not a requirement).

What we're talking about here should be considered a new type.

  • ListType has list lengths that vary from item to item,
  • this would have list lengths that are all the same in an array, but not part of the Form or Type and not used to compile code in Numba, and
  • RegularArray, which has list lengths that are all the same in an array and it is part of the Form/Type and is used to compile.

That middle case would have to store its size somewhere in an array buffer (of length 1?) because it can't be in the Form/Type object. Conflating the last two bullet points, i.e. making a RegularType in which the size is in the Form/Type but is compiled in Numba with runtime size, rather than compile-time size, would be possible but it feels to me like mixing concepts.

Adding a new node type is a big project, and it would differ from an existing node in a subtle way. And on top of that, we just have so much to do.

@jpivarski jpivarski added the wontfix This will not be worked on label Apr 15, 2022
@agoose77
Copy link
Collaborator Author

Definitely not proposing a new node type, particularly not when it would only be required to support Numba!

making a RegularType in which the size is in the Form/Type but is compiled in Numba with runtime size, rather than compile-time size, would be possible but it feels to me like mixing concepts

This was what I was thinking of. However, it might be that we just accept that users need to be aware of these cases. Where users have layouts that change in regular size, they would need to convert the layout to jagged lists ahead-of-time.

the Numba team was discussing the possibility of going the other way with their NumPy handling

Huh, I hadn't seen that. Interesting. The motivation for filing this issue was that we differ from Numba, so when a user has a ak.from_numpy array, it behaves differently to the pure NumPy array, (at least, I assume that we don't have a special case for the Regular(Numpy) case in our Numba interface. Maybe that's acceptable, with the right docs.

The other issues that we have to work on certainly take precedence over this one. Maybe we'll revisit it in the future if we find it is a pressing concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request performance Works, but not fast enough or uses too much memory wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants