Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use cppyy for JIT #2306

Merged
merged 18 commits into from
Mar 21, 2023
Merged

feat: use cppyy for JIT #2306

merged 18 commits into from
Mar 21, 2023

Conversation

ianna
Copy link
Collaborator

@ianna ianna commented Mar 13, 2023

No description provided.

@ianna ianna marked this pull request as draft March 13, 2023 11:04
@codecov
Copy link

codecov bot commented Mar 13, 2023

Codecov Report

Merging #2306 (688c337) into main (6adf9a4) will decrease coverage by 0.07%.
The diff coverage is 78.02%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/cppyy.py 27.27% <27.27%> (ø)
src/awkward/highlevel.py 76.03% <28.57%> (-1.09%) ⬇️
src/awkward/index.py 90.06% <83.33%> (-0.39%) ⬇️
src/awkward/operations/ak_cartesian.py 92.00% <98.24%> (+1.02%) ⬆️
src/awkward/__init__.py 97.05% <100.00%> (+0.08%) ⬆️
src/awkward/_broadcasting.py 89.33% <100.00%> (+0.04%) ⬆️

@ianna ianna temporarily deployed to docs-preview March 13, 2023 11:17 — with GitHub Actions Inactive
@ianna ianna temporarily deployed to docs-preview March 17, 2023 16:08 — with GitHub Actions Inactive
Copy link
Collaborator Author

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski - I guess, this convert function can be our dunder/magic method that @wlav can call when cppyy sees an awkward.Array.

tests/test_2306_cppyy_git.py Outdated Show resolved Hide resolved
@ianna ianna temporarily deployed to docs-preview March 20, 2023 11:12 — with GitHub Actions Inactive
@ianna ianna temporarily deployed to docs-preview March 20, 2023 13:39 — with GitHub Actions Inactive
@ianna ianna temporarily deployed to docs-preview March 20, 2023 14:22 — with GitHub Actions Inactive
@ianna ianna temporarily deployed to docs-preview March 20, 2023 14:37 — with GitHub Actions Inactive
Copy link
Collaborator Author

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski - please, have a look.

tests/test_2306_cppyy_git.py Outdated Show resolved Hide resolved
@ianna ianna marked this pull request as ready for review March 20, 2023 15:23
@ianna ianna requested a review from jpivarski March 20, 2023 15:24
@ianna ianna temporarily deployed to docs-preview March 20, 2023 15:29 — with GitHub Actions Inactive
src/awkward/highlevel.py Outdated Show resolved Hide resolved
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good so far!

Don't worry about any copy-by-value of data structures smaller than, say, 64 bytes. The 2-value tuple (struct { ssize_t length; ssize_t* ptr }, 16 bytes) is fine to copy, and the small ArrayViews are even bigger (32 bytes, I think). Also, the hand-off between Python and C++ has a lot more overhead than copying an object.

Some questions:

  • Does this still work if a function takes two array arguments? (That's something that can't be done in RDataFrame, a motivation for this more direct kind of interface.) The user-written template should have two template parameters.
  • If a function is called on the same array twice, does it get recompiled? Hopefully not.
  • If a function is called on the same Form of array twice, does it get recompiled? Hopefully not.
  • Will array return values be another PR? As in Numba, you'd only be able to return a direct view of part of the original array that was passed in, not an entirely new array.1 (There's LayoutBuilder for that.) The returned view needs to pass ownership, just as it does in Numba. This is the reason that all ArrayViews in Numba carry a PyObject*, and that has to be passed in, a third argument of cpppars, along with length and ptr.

Footnotes

  1. One use-case for being able to return a part of the original array is to be able to write a function that iterates through an array and stops and returns a record of interest.

src/awkward/highlevel.py Outdated Show resolved Hide resolved
tests/test_2306_cppyy_git.py Outdated Show resolved Hide resolved
@wlav
Copy link

wlav commented Mar 20, 2023

For consistency, I ended up calling the implicit method __cast_cpp__, but that's open to change. An example of use, that we can use to flesh out the details, is here: https://github.com/wlav/cppyy/blob/master/test/test_advancedcpp.py#L901

This is in repo master. I cut 3.0.0 yesterday, b/c Windows and now also Mac no longer support Clang9, but it's been a bit of a rush job and I've already found some issues, so I hope to cut 3.0.1 soon.

agoose77 and others added 3 commits March 21, 2023 14:16
* fix: expose array interface for CUDA

* test: cover other cases
* docs: fix missing Python kernel!

* docs: fix pyodide-build
@ianna ianna temporarily deployed to docs-preview March 21, 2023 13:30 — with GitHub Actions Inactive
@ianna ianna temporarily deployed to docs-preview March 21, 2023 16:15 — with GitHub Actions Inactive
@ianna
Copy link
Collaborator Author

ianna commented Mar 21, 2023

For consistency, I ended up calling the implicit method __cast_cpp__, but that's open to change. An example of use, that we can use to flesh out the details, is here: https://github.com/wlav/cppyy/blob/master/test/test_advancedcpp.py#L901

This is in repo master. I cut 3.0.0 yesterday, b/c Windows and now also Mac no longer support Clang9, but it's been a bit of a rush job and I've already found some issues, so I hope to cut 3.0.1 soon.

Thanks, @wlav ! Everything works with cppyy built from source.

@jpivarski - I've restricted the cppyy version to 3.0.1 - the minimum required for Awkward-cppyy GIT. I will write documentation tomorrow. Unless you'd like me to add more tests to this PR it can go in.

@agoose77 - please, have a look if the comments are understandable. Thanks!

@ianna ianna temporarily deployed to docs-preview March 21, 2023 18:48 — with GitHub Actions Inactive
Copy link
Collaborator Author

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski - everything works as expected!

@ianna ianna requested review from jpivarski and agoose77 March 21, 2023 18:55
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After committing the suggested changes, I think it's ready to go.

Yes, documentation next would be great, thanks!

src/awkward/cppyy.py Outdated Show resolved Hide resolved
src/awkward/cppyy.py Outdated Show resolved Hide resolved
tests/test_2306_cppyy_git.py Show resolved Hide resolved
ianna and others added 2 commits March 21, 2023 20:34
Co-authored-by: Jim Pivarski <jpivarski@users.noreply.github.com>
Co-authored-by: Jim Pivarski <jpivarski@users.noreply.github.com>
@ianna ianna temporarily deployed to docs-preview March 21, 2023 19:45 — with GitHub Actions Inactive
@ianna ianna merged commit c44a622 into main Mar 21, 2023
@ianna ianna deleted the ianna/cppyy-connect-array-view branch March 21, 2023 20:12
@agoose77
Copy link
Collaborator

@ianna cool! This will be so useful going forward 😎

Sorry only get these comments in after merging, I was eating!

Just to check — this is a pretty niche method; most Array's won't be using cppyy. I wonder whether we should exclusively move this to an ak. function, so that we don't add any more methods to Array's namespace? I'd be similiarly tempted to do the same for numba_type in the long run. @jpivarski you might also have thoughts on this proposal.

Secondly, could we rename this to cpp_type, to mirror numba_type?

I can make a PR if people agree on any of these suggestions.

@jpivarski
Copy link
Member

@agoose77 Actually, I recommended the public property in ak.Array's namespace, since type and numbatype were already there—although I had missed the fact that the Numba one now has an underscore. Moving it from cpptype to cpp_type before the next release is a good idea.

In general, we try to keep the ak.Array namespace available for subclass method and record-field names, but L1 types are the sort of things that you'd want to have quick access to. Also, the one that is using the best real estate is type: it's quite likely that a record array would have a field named "type", and now it will have to be accessed with square brackets. However, type is also the most prominent (least-niche) one.

Although the fraction of physicists using cppyy directly may be small because of the interference of ROOT1, there's no such issue for general-science use-cases. Depending on how we manage to promote it, this interface may not be so niche. Most of the general-science users of Awkward so far have found the Numba bridge to be an essential feature, and there are things you can't do with Numba, such as connect to libraries that use C++ types in the interfaces. (Also, dealing with specialized types in Numba, such as convincing it to use 32-bit floats, is cumbersome.)

If you want to add the underscore, so that we have numba_type and cpp_type, please do. @ianna will be documenting next, so it's just a matter of using the new name.

Footnotes

  1. As a reminder, I'd like to try to take this up. Most PyROOT users don't know that the library they're using is called cppyy; if one module name has to move, it seems reasonable to me that it be the one in ROOT. Even changing it to ROOT.cppyy would solve the issue, and it's more tidy for a singly installable package to have only one top-level module name, anyway. I should ask in a ROOT meeting if that's something they're open to, even as an external PR.

@agoose77
Copy link
Collaborator

@agoose77 Actually, I recommended the public property in ak.Array's namespace, since type and numbatype were already there

Fine! For posterity, my motivations to remove this from the namespace are:

  • it's niche; most users aren't likely to use cppyy, or at least, aren't likely to be calling .cpp_type very often. ak.cpp_type(arr) is not much extra typing
  • it sets a precedent for future decisions

And probably some other ones ... can't remember!

I consider, however, that we're probably already on the same page; in your view, the pros outweigh the cons?

As such, I'll make a PR just to rename the attribute! :)

Thanks team.

@ianna
Copy link
Collaborator Author

ianna commented Mar 21, 2023

@agoose77 - I think, I prefer a method because we have to keep a lookup lifetime connected to an array lifetime.

@agoose77 agoose77 mentioned this pull request Mar 21, 2023
@agoose77
Copy link
Collaborator

@agoose77 - I think, I prefer a method because we have to keep a lookup lifetime connected to an array lifetime.

Hmm, yes, good point. In this case the solution would be to only hide the public cpp_type getter; i.e. only have ak.Array._cpp_type. I like this less, though, because there's no clear suggestion that this is different to the existing private variable (which would have to be renamed).

We have a similar L3-L4 problem with contents, but in that case we have a protocol system (ak._do) that makes it clear which private methods are truly private.

Given how niche a name cpp_type is (whoever is going to want to use that for an analysis?), I think making this a public method is the right call.

@jpivarski
Copy link
Member

I forgot that it also has to change the state of the ak.Array (for the lookup lifetime). The same is true of numba_type.

If you remember, I kind-of also wanted to get rid of the ak.type(...) function and leave only the property. It's not much more typing (with a keyboard), it's just unnatural when you're working interactively. It's often the case that you've built up an expression and then think, "Wait—I should check to see what type this is." Doing that by adding ".type" at the end is more natural than having to wrap the whole expression in parentheses to make the function call. Right after checking its type, the next thing you'd likely do is remove the ".type" and get the array itself.

This is the same argument for having a to_list() (and tolist()) method, rather than relying only on the ak.to_list(...) function. And the show() method doesn't even have a functional counterpart. This argument doesn't apply to everything, but there are a few things you want an object oriented interface to when working interactively. Types are basic, and even though cpp_type and numba_type are less used, it makes sense, organizationally, to find them in the same place as type.

@agoose77
Copy link
Collaborator

Doing that by adding ".type" at the end is more natural than having to wrap the whole expression in parentheses to make the function call.

Agreed. In the phase space of "how convenient would this be — how often would users do this — how valuable a name is this", .type is fairly far from the origin. Meanwhile, .numba_type is much less frequently used, even in analyses with Numba, yet it is also less useful as a name for users to overload.

I think we will need to make decisions that appear inconsistent if you look at only one of these axes, but it's the holistic picture that we should be considering. On balance, the convenience of self._cpp_type = ... is a deciding factor for me here! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants