-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support conditional expressions in vectorized map #5017
Conversation
bd0516a
to
e84f58b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! Just adding comments from our dive into IsValid
vs IsNull
.
@OfTheDelmer thanks for your notes. I'll do another pass on the null handling, hopefully with some new tests to verify that:
I think the blocks called out in the review may simplify further with this contract in mind. |
740a622
to
2afaa99
Compare
Tests added in 2afaa99 for null inputs in each of the three vectors appear to be passing as-is, so that's something. Working to simplify the checks now. |
0da2222
to
65ac960
Compare
@OfTheDelmer I've simplified the null checks in 65ac960 -- if these changes seem right to you, can you resolve the remarks in your review? |
Updates the Go side to support vectorized conditionals. Adds new template-generated code to handle the arrow array building code as well as updates to the `values` package which acts as an entry point to help delegate to the correct typed API based on the MonoType of the incoming vectors.
Currently the Go unit tests that actually process inputs are failing due to a type conflict on the `test` portion of the expression, ex: `bool != v[bool]`. Not sure why I didn't bump into this earlier -- manual testing via scripts all seemed to work fine as is. Watching in the debugger showed execution hitting all the expected (vectorized) code paths. The "Logical Ops" PR added a `typ` field to the expression node to allow us to alter the monotype during vectorization, so I'll do the same and see if that fixes things.
Fixes an issue where the tests for vectorized conditionals broke with a type conflict on the `test` part of the expression (`bool` vs `v[bool]`). For whatever reason, it seems like we may have been running the row-based version of the function prior to this diff. Now, when a test case sets `vectorizable` to `true`, we replace the function expression with the vectorized version before passing it to `Compile`.
The test cases try to cover the range of types we support, while also targeting constants in the consequent/alternate positions to verify vec repeat plays well. Right now, the tests exercising vec repeat fail! Fixes pending.
Already covered were cases where `test` was const, as well as when EITHER `consequent` OR `alternate` were const, but not BOTH. For cases where all 3 are const, the receiver of of the return value of the conditional expression will get the vec repeat from whichever branch was selected (this case was already handled). The panic from the type conflict between Time/Int is still to resolve.
Adds a new flux feature flag, propagating it down into the rust analyzer code so we can use it to activate or disable the new code path. Tests have been updated to enable the flag, but it is off by default.
Lightly modified version of work done in this (currently draft) PR: #5015
65ac960
to
5fae644
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update looks great
@@ -81,47 +87,71 @@ func newVectorFromSlice(values []Value, typ semantic.MonoType, mem memory.Alloca | |||
case semantic.BasicInt: | |||
b := arrow.NewIntBuilder(mem) | |||
for _, v := range values { | |||
b.Append(v.Int()) | |||
if v.IsNull() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Initial work for vectorized conditionals neglected to cover cases where the inputs for `test`, `consequent`, and `alternate` could be invalid (ie, referencing fields which could not be statically verified and missing at runtime). This diff aims to handle these cases by: - treating a null `test` the same as "all false" by returning `alternate` directly. - changing the constant (vec repeat) code paths to accept pointers to the various primitive types so they can be optionally null. - short circuiting for cases where both `consequent` and `alternate` are null (since the output will be null, regardless of the values in `test`). The choice to adjust the constant handling code path to also handle the null case was largely motivated by wanting to avoid adding yet another factor to drive combinations of function signatures. This would have required a large expansion of functions in the `array` package, as well as higher complexity to be able to delegate to the new functions from the `values` side. We stated with needing (per type): - vec, const - const, vec - vec, vec - const, const By going this way, we avoid adding (per type): - vec, null - null, vec - const, null - null, const Furthermore, the logic required for nulls in either position is more or less identical to the const pattern, with the addition of a null check of the pointer. Refs: - <#4601> - <#5017>
Initial work for vectorized conditionals neglected to cover cases where the inputs for `test`, `consequent`, and `alternate` could be invalid (ie, referencing fields which could not be statically verified and missing at runtime). This diff aims to handle these cases by: - treating a null `test` the same as "all false" by returning `alternate` directly. - changing the constant (vec repeat) code paths to accept pointers to the various primitive types so they can be optionally null. - short circuiting for cases where both `consequent` and `alternate` are null (since the output will be null, regardless of the values in `test`). The choice to adjust the constant handling code path to also handle the null case was largely motivated by wanting to avoid adding yet another factor to drive combinations of function signatures. This would have required a large expansion of functions in the `array` package, as well as higher complexity to be able to delegate to the new functions from the `values` side. We stated with needing (per type): - vec, const - const, vec - vec, vec - const, const By going this way, we avoid adding (per type): - vec, null - null, vec - const, null - null, const Furthermore, the logic required for nulls in either position is more or less identical to the const pattern, with the addition of a null check of the pointer. Refs: - <#4601> - <#5017>
…est, consequent or alternate (#5118) * test: check for nulls in vectorized conditionals Nulls need special care for vectorized conditionals, similar to how we manage constants. If the test, consequent or alternate reference fields that don't exist then we need to act as if we have a "constant null" for the operation. * fix: handle null test, consequent, alternate in vectorized conditional Initial work for vectorized conditionals neglected to cover cases where the inputs for `test`, `consequent`, and `alternate` could be invalid (ie, referencing fields which could not be statically verified and missing at runtime). This diff aims to handle these cases by: - treating a null `test` the same as "all false" by returning `alternate` directly. - changing the constant (vec repeat) code paths to accept pointers to the various primitive types so they can be optionally null. - short circuiting for cases where both `consequent` and `alternate` are null (since the output will be null, regardless of the values in `test`). The choice to adjust the constant handling code path to also handle the null case was largely motivated by wanting to avoid adding yet another factor to drive combinations of function signatures. This would have required a large expansion of functions in the `array` package, as well as higher complexity to be able to delegate to the new functions from the `values` side. We stated with needing (per type): - vec, const - const, vec - vec, vec - const, const By going this way, we avoid adding (per type): - vec, null - null, vec - const, null - null, const Furthermore, the logic required for nulls in either position is more or less identical to the const pattern, with the addition of a null check of the pointer. Refs: - <#4601> - <#5017> * chore: use sentinel Null instead of New(nil) * test: add planner assertion to new vectorized tests * chore: make generate
Fixes #4601
Adds support for vectorized functions containing conditional expressions.
This new code path is guarded by a feature flag called
vectorizedConditionals
, enabled in test but off by default.In cases where the
test
(ie the condition part of the expression) is all one value or another, we can skip a lot of the new code by returning either the theconsequent
oralternate
vector directly.For other situations, both the
consequent
andalternate
vectors must be evaluated in their entirety after which values are selected for return based on thetest
value for the given index.N.b. this could give an inconsistent outcome for cases where these expressions have some sort of side-effect when the when compared to the non-vectorized version. At this time, such expressions would prevent the function from being vectorized in the first place, but we may need to revisit this detail in the future.
The flux acceptance tests pass with or without the flag enabled, but in the future when #4739 lands, these testcases should be updated to assert the vectorized code path is actually being hit.
Go unit tests verify the behavior in a more isolated way.
Done checklist