Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: set allow_lazy to True to avoid touching potential placeholders #3379

Merged

Conversation

pfackeldey
Copy link
Collaborator

@pfackeldey pfackeldey commented Jan 23, 2025

(Finally) fixes: scikit-hep/coffea#1231. This issue has causing me multiple headaches.

This PR is a follow-up of #2524 for IndexedOptionArrays. The problem here is that a RecordArray may be wrapped in a IndexedOptionArray after a ak.with_field/__setitem__ operation. This has been fixed for IndexedArrays in #2524, but not for option types.

By setting allow_lazy=True in the _carry function we don't apply a slice to each content of a RecordArray, which causes errors for PlaceholderArrays. Instead this wraps them in an IndexedArray and thus follows the fix provided in #2524 by "pushing" the IndexedArray down into each field of the RecordArray in the next recursion during broadcasting.

Adding the same "pushing" fix to IndexedOptionArrays led to multiple issues, which is why I gave up on it. If you have an idea how to implement this, I can give it another try. I'm not 100% sure if my/this solution is what we want.

(PS: I think, setting allow_lazy=True here should also benefit in the scope of the current VirtualArray implementation (#3364) as it delays materialization of slicing RecordArrays. The same is likely true for IndexedArrays?)

@pfackeldey
Copy link
Collaborator Author

(sorry for pinging the three of you, but I feel only partly convinced that this is the correct fix. Given that this has been fixed in #2524 I asked for a review from all participating parties of this PR.)

Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfackeldey - Thanks! I agree with your arguments. Also, if it fixes coffea, we shoud get it out asap. Thanks!

@agoose77
Copy link
Collaborator

agoose77 commented Jan 23, 2025

The obvious consequence of this change is that operations that expect project() to return a particular type no longer have this guarantee. I'd want to check uses of project() internally to confirm the impact.

I don't think the performance for non-dask cases will be hugely affected, which is nice; we're already creating the index buffer that is being used, so it's not exactly more memory. It does mean that carrying can have multiple times, though, if successive operations operate on this layout. In any case, I think a trend towards laziness is probably a good move.

Jim will have thoughts; we have tweaked these defaults for other node types before.

@jpivarski
Copy link
Member

The application of allow_lazy in the various places where carry is called can lead to infinite recursion errors if you're not careful. If this were an IndexedArray.project passing allow_lazy=True, it would definitely cause infinite recursion because the two-level IndexedArray + X structure wouldn't get unpacked to X, it would remain IndexedArray + X, and the broadcasting code that's trying to unpack it would probably call it over and over and never get anywhere. But this is IndexedOptionArray.project, so IndexedOptionArray + X first gets unpacked to IndexedArray + X, and then another step unpacks it to just X. That's apparently fine because the tests passed.

@agoose77, I don't think that there's any code that expects the output of project to have a particular layout/form, just a particular type (the same type for an IndexedArray or the type without "option" for an IndexedOptionArray). That is, unless you count the issue I described in the previous paragraph, in which broadcasting code expects to make a set of inputs simpler (i.e. "unpacked") in each descending step. We do assume that calling project doesn't give you exactly the same layout/form you had before, but a simpler one. However, that's preserved by this code change.

This is motivated by correctness, not performance, right? Getting these allow_lazy arguments right was tricky and had long-range consequences (though they've always been revealed by the tests, and while Uproot is often losing tests due to data servers going down, Awkward Array isn't). If the motivation is performance, or at least if it's anything less than a factor of 5 or 10 or so, I'd be nervous about accepting it right away. If it's breaking code with PlaceholderArrays, then that's a good argument to take this step.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the above was a review. I expressed a conditional, but I'm going to press "approve" here.

@pfackeldey
Copy link
Collaborator Author

This is motivated by correctness, not performance, right?

This PR is purely motivated by correctness. (I don't have a good feeling what implications this would have for performance.)
If you have a different idea how to achieve this correctness I can also give it a try, but I couldn't come up with anything so far.

@pfackeldey
Copy link
Collaborator Author

pfackeldey commented Jan 23, 2025

As this PR touches a critical part of awkward's internals, I run in addition (locally) the test suites of vector, dask-awkward, and coffea against this branch and got:

  • vector: ✅
  • dask-awkward: ✅
  • coffea: ⛔

coffea needs #3383 and #3384 to go in, and then a little tweak in one of coffea's test to adopt to the new .mask usage (PR is here: scikit-hep/coffea#1253) with weak-refs as introduced by #3347. Then coffea's tests pass aswell.

This PR is not responsible for the failing test in coffea, so I think we can go ahead and merge this one.

@ianna ianna merged commit 36bbc01 into main Jan 24, 2025
39 checks passed
@ianna ianna deleted the pfackeldey/fix_IndexedOptionArray_broadcasting_with_placeholders branch January 24, 2025 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants