You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this epic is to realize performance gains from doing operations column-at-a-time instead of row-at-a-time, especially for common Flux functions that evaluate a function like map and filter. Since our internal representation of tables is columnar, this yields a big benefit. A proof of concept that demonstrates these performance gains is described here: #3867 (comment)
The working design for this feature has changed a bit recently (2021-10-26) to work like this:
During the "normal" analysis phase (parsing to AST, conversion to semantic graph, type inference), perform a vectorization pass on any FunctionExpression whose signature matches the pattern used by map, that is a single parameter called r of Record type. If the pass is successful, a new field in FunctionExpression will be populated with vectorized function (also a FunctionExpression). This vectorization pass can be performed after type inference is complete.
To start with the vectorization pass can just be minimally viable and only work with very simple examples like (r) => ({r with a: r.a}). Over time we can expand the kinds of functions that can be vectorized as we get a better understanding of the problem space and as the capabilities are added to the Compiler.
At runtime, as a call to map is executed, the function's vectorizedFn field can be checked and if it is not nil, then map can opt to execute the vectorized version of the function instead of the normal one. This logic can be controlled via a feature flag so if we need to revert to the old row-based evaluation in production, we can do so.
After a minimally viable end-to-end version of the feature is complete, we can move on to vectorizing more kinds operations. At this point we will want to add a vector_repeat semantic graph node that would allow expressions like r.a + 1 to be executed as r.a + vector_repeat(1).
The advantage of this design is that it paves the way for vectorizing lots Flux programs, not just the lambdas evalauted by map, and it also avoids making a another trip into Rust code at runtime. No changes to the Go/Rust interface will be required.
A previous iteration of this design is here (note that much of this description is out of date): #3993 (comment)
nathanielc
changed the title
Vectorize execution of map, filter and other Flux functions
EPIC: Vectorize execution of map, filter and other Flux functions
Jan 10, 2022
Did some research work in #4580 to determine where the biggest impact might be for next steps (building on the earlier work for map(), ref: #4186).
Vectorization of if/then/else sounded to be higher complexity than many other items in the list, and is a prerequisite for comparison ops (eq, neq, lt, and so on).
Arithmetic operators seem to have big impact while not necessarily depending on any other item being optimized first. For example, it's very common for map() to add new fields derived from other fields using arithmetic.
New issues will be filed for each of add, subtract, multiply, divide and we can tackle them as appetite allows.
The purpose of this epic is to realize performance gains from doing operations column-at-a-time instead of row-at-a-time, especially for common Flux functions that evaluate a function like
map
andfilter
. Since our internal representation of tables is columnar, this yields a big benefit. A proof of concept that demonstrates these performance gains is described here:#3867 (comment)
The working design for this feature has changed a bit recently (2021-10-26) to work like this:
During the "normal" analysis phase (parsing to AST, conversion to semantic graph, type inference), perform a vectorization pass on any
FunctionExpression
whose signature matches the pattern used bymap
, that is a single parameter calledr
of Record type. If the pass is successful, a new field inFunctionExpression
will be populated with vectorized function (also aFunctionExpression
). This vectorization pass can be performed after type inference is complete.To start with the vectorization pass can just be minimally viable and only work with very simple examples like
(r) => ({r with a: r.a})
. Over time we can expand the kinds of functions that can be vectorized as we get a better understanding of the problem space and as the capabilities are added to theCompiler
.At runtime, as a call to
map
is executed, the function'svectorizedFn
field can be checked and if it is notnil
, thenmap
can opt to execute the vectorized version of the function instead of the normal one. This logic can be controlled via a feature flag so if we need to revert to the old row-based evaluation in production, we can do so.After a minimally viable end-to-end version of the feature is complete, we can move on to vectorizing more kinds operations. At this point we will want to add a
vector_repeat
semantic graph node that would allow expressions liker.a + 1
to be executed asr.a + vector_repeat(1)
.The advantage of this design is that it paves the way for vectorizing lots Flux programs, not just the lambdas evalauted by
map
, and it also avoids making a another trip into Rust code at runtime. No changes to the Go/Rust interface will be required.A previous iteration of this design is here (note that much of this description is out of date):
#3993 (comment)
Quest Tracking:
map
function #4186VectorRepeatValue
support for Logical Expressions #4996VectorRepeatValue
when they point to literals #5037and
,or
) #4599and
,or
) #4600if/then/else
#4601if/then/else
) #4602float()
function #4603float()
function #4604uint()
withVectorRepeatValue
support #5036The text was updated successfully, but these errors were encountered: