Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support conditional expressions in vectorized map #5017

Merged
merged 16 commits into from
Jul 29, 2022

Conversation

onelson
Copy link
Contributor

@onelson onelson commented Jul 23, 2022

Fixes #4601

Adds support for vectorized functions containing conditional expressions.
This new code path is guarded by a feature flag called vectorizedConditionals, enabled in test but off by default.

In cases where the test (ie the condition part of the expression) is all one value or another, we can skip a lot of the new code by returning either the the consequent or alternate vector directly.

For other situations, both the consequent and alternate vectors must be evaluated in their entirety after which values are selected for return based on the test value for the given index.
N.b. this could give an inconsistent outcome for cases where these expressions have some sort of side-effect when the when compared to the non-vectorized version. At this time, such expressions would prevent the function from being vectorized in the first place, but we may need to revisit this detail in the future.

The flux acceptance tests pass with or without the flag enabled, but in the future when #4739 lands, these testcases should be updated to assert the vectorized code path is actually being hit.

Go unit tests verify the behavior in a more isolated way.

Done checklist

  • docs/SPEC.md updated N/A
  • Test cases written

@onelson onelson force-pushed the onelson/feat/vectorize-conditional-exprs branch 2 times, most recently from bd0516a to e84f58b Compare July 25, 2022 18:49
@onelson onelson changed the title Onelson/feat/vectorize conditional exprs feat: support conditional expressions in vectorized map Jul 25, 2022
@onelson onelson marked this pull request as ready for review July 26, 2022 19:39
@onelson onelson requested a review from a team as a code owner July 26, 2022 19:39
@onelson onelson requested review from OfTheDelmer and removed request for a team July 26, 2022 19:39
Copy link
Contributor

@OfTheDelmer OfTheDelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! Just adding comments from our dive into IsValid vs IsNull.

array/conditional.gen.go.tmpl Outdated Show resolved Hide resolved
array/conditional.gen.go.tmpl Outdated Show resolved Hide resolved
array/conditional.gen.go.tmpl Outdated Show resolved Hide resolved
@onelson
Copy link
Contributor Author

onelson commented Jul 28, 2022

@OfTheDelmer thanks for your notes. I'll do another pass on the null handling, hopefully with some new tests to verify that:

  • when test is null it is considered false
  • when either consequent or alternate are null
    • ... and they are the selected branch
    • we produce a null in the output vector

I think the blocks called out in the review may simplify further with this contract in mind.

@onelson onelson force-pushed the onelson/feat/vectorize-conditional-exprs branch from 740a622 to 2afaa99 Compare July 28, 2022 18:36
@onelson
Copy link
Contributor Author

onelson commented Jul 28, 2022

Tests added in 2afaa99 for null inputs in each of the three vectors appear to be passing as-is, so that's something.

Working to simplify the checks now.

@onelson onelson force-pushed the onelson/feat/vectorize-conditional-exprs branch from 0da2222 to 65ac960 Compare July 28, 2022 19:21
@onelson
Copy link
Contributor Author

onelson commented Jul 28, 2022

@OfTheDelmer I've simplified the null checks in 65ac960 -- if these changes seem right to you, can you resolve the remarks in your review?

@onelson onelson requested a review from OfTheDelmer July 28, 2022 19:25
Owen Nelson added 15 commits July 28, 2022 13:20
Updates the Go side to support vectorized conditionals.
Adds new template-generated code to handle the arrow array building code
as well as updates to the `values` package which acts as an entry point
to help delegate to the correct typed API based on the MonoType of the
incoming vectors.
Currently the Go unit tests that actually process inputs are failing due
to a type conflict on the `test` portion of the expression,
ex: `bool != v[bool]`.

Not sure why I didn't bump into this earlier -- manual testing via
scripts all seemed to work fine as is. Watching in the debugger showed
execution hitting all the expected (vectorized) code paths.

The "Logical Ops" PR added a `typ` field to the expression node to allow
us to alter the monotype during vectorization, so I'll do the same and
see if that fixes things.
Fixes an issue where the tests for vectorized conditionals broke with a
type conflict on the `test` part of the expression (`bool` vs
`v[bool]`).

For whatever reason, it seems like we may have been running the
row-based version of the function prior to this diff.

Now, when a test case sets `vectorizable` to `true`, we replace the
function expression with the vectorized version before passing it to
`Compile`.
The test cases try to cover the range of types we support, while also
targeting constants in the consequent/alternate positions to verify vec
repeat plays well.

Right now, the tests exercising vec repeat fail! Fixes pending.
Already covered were cases where `test` was const, as well as when
EITHER `consequent` OR `alternate` were const, but not BOTH.

For cases where all 3 are const, the receiver of of the return value of
the conditional expression will get the vec repeat from whichever branch
was selected (this case was already handled).

The panic from the type conflict between Time/Int is still to resolve.
Adds a new flux feature flag, propagating it down into the rust analyzer
code so we can use it to activate or disable the new code path.

Tests have been updated to enable the flag, but it is off by default.
Lightly modified version of work done in this (currently draft) PR:
#5015
@onelson onelson force-pushed the onelson/feat/vectorize-conditional-exprs branch from 65ac960 to 5fae644 Compare July 28, 2022 20:20
Copy link
Contributor

@OfTheDelmer OfTheDelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update looks great

array/conditional.gen.go.tmpl Show resolved Hide resolved
@@ -81,47 +87,71 @@ func newVectorFromSlice(values []Value, typ semantic.MonoType, mem memory.Alloca
case semantic.BasicInt:
b := arrow.NewIntBuilder(mem)
for _, v := range values {
b.Append(v.Int())
if v.IsNull() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@onelson onelson merged commit 1715d9d into master Jul 29, 2022
@onelson onelson deleted the onelson/feat/vectorize-conditional-exprs branch July 29, 2022 20:22
onelson pushed a commit that referenced this pull request Aug 20, 2022
Initial work for vectorized conditionals neglected to cover cases where
the inputs for `test`, `consequent`, and `alternate` could be invalid
(ie, referencing fields which could not be statically verified and
missing at runtime).

This diff aims to handle these cases by:
- treating a null `test` the same as "all false" by returning
  `alternate` directly.
- changing the constant (vec repeat) code paths to accept pointers to
  the various primitive types so they can be optionally null.
- short circuiting for cases where both `consequent` and `alternate` are
  null (since the output will be null, regardless of the values in `test`).

The choice to adjust the constant handling code path to also handle the
null case was largely motivated by wanting to avoid adding yet another
factor to drive combinations of function signatures.

This would have required a large expansion of functions in the `array`
package, as well as higher complexity to be able to delegate to the new
functions from the `values` side.

We stated with needing (per type):
- vec, const
- const, vec
- vec, vec
- const, const
By going this way, we avoid adding (per type):
- vec, null
- null, vec
- const, null
- null, const

Furthermore, the logic required for nulls in either position is more or
less identical to the const pattern, with the addition of a null check
of the pointer.

Refs:
- <#4601>
- <#5017>
onelson pushed a commit that referenced this pull request Aug 22, 2022
Initial work for vectorized conditionals neglected to cover cases where
the inputs for `test`, `consequent`, and `alternate` could be invalid
(ie, referencing fields which could not be statically verified and
missing at runtime).

This diff aims to handle these cases by:
- treating a null `test` the same as "all false" by returning
  `alternate` directly.
- changing the constant (vec repeat) code paths to accept pointers to
  the various primitive types so they can be optionally null.
- short circuiting for cases where both `consequent` and `alternate` are
  null (since the output will be null, regardless of the values in `test`).

The choice to adjust the constant handling code path to also handle the
null case was largely motivated by wanting to avoid adding yet another
factor to drive combinations of function signatures.

This would have required a large expansion of functions in the `array`
package, as well as higher complexity to be able to delegate to the new
functions from the `values` side.

We stated with needing (per type):
- vec, const
- const, vec
- vec, vec
- const, const
By going this way, we avoid adding (per type):
- vec, null
- null, vec
- const, null
- null, const

Furthermore, the logic required for nulls in either position is more or
less identical to the const pattern, with the addition of a null check
of the pointer.

Refs:
- <#4601>
- <#5017>
onelson pushed a commit that referenced this pull request Aug 22, 2022
…est, consequent or alternate (#5118)

* test: check for nulls in vectorized conditionals

Nulls need special care for vectorized conditionals, similar to how we
manage constants. If the test, consequent or alternate reference fields
that don't exist then we need to act as if we have a "constant null" for
the operation.

* fix: handle null test, consequent, alternate in vectorized conditional

Initial work for vectorized conditionals neglected to cover cases where
the inputs for `test`, `consequent`, and `alternate` could be invalid
(ie, referencing fields which could not be statically verified and
missing at runtime).

This diff aims to handle these cases by:
- treating a null `test` the same as "all false" by returning
  `alternate` directly.
- changing the constant (vec repeat) code paths to accept pointers to
  the various primitive types so they can be optionally null.
- short circuiting for cases where both `consequent` and `alternate` are
  null (since the output will be null, regardless of the values in `test`).

The choice to adjust the constant handling code path to also handle the
null case was largely motivated by wanting to avoid adding yet another
factor to drive combinations of function signatures.

This would have required a large expansion of functions in the `array`
package, as well as higher complexity to be able to delegate to the new
functions from the `values` side.

We stated with needing (per type):
- vec, const
- const, vec
- vec, vec
- const, const
By going this way, we avoid adding (per type):
- vec, null
- null, vec
- const, null
- null, const

Furthermore, the logic required for nulls in either position is more or
less identical to the const pattern, with the addition of a null check
of the pointer.

Refs:
- <#4601>
- <#5017>

* chore: use sentinel Null instead of New(nil)

* test: add planner assertion to new vectorized tests

* chore: make generate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vectorize conditional expressions if/then/else
2 participants