EPIC: Vectorize execution of map, filter and other Flux functions #4028

wolffcm · 2021-09-10T20:48:21Z

The purpose of this epic is to realize performance gains from doing operations column-at-a-time instead of row-at-a-time, especially for common Flux functions that evaluate a function like map and filter. Since our internal representation of tables is columnar, this yields a big benefit. A proof of concept that demonstrates these performance gains is described here:
#3867 (comment)

The working design for this feature has changed a bit recently (2021-10-26) to work like this:

During the "normal" analysis phase (parsing to AST, conversion to semantic graph, type inference), perform a vectorization pass on any FunctionExpression whose signature matches the pattern used by map, that is a single parameter called r of Record type. If the pass is successful, a new field in FunctionExpression will be populated with vectorized function (also a FunctionExpression). This vectorization pass can be performed after type inference is complete.

To start with the vectorization pass can just be minimally viable and only work with very simple examples like (r) => ({r with a: r.a}). Over time we can expand the kinds of functions that can be vectorized as we get a better understanding of the problem space and as the capabilities are added to the Compiler.

At runtime, as a call to map is executed, the function's vectorizedFn field can be checked and if it is not nil, then map can opt to execute the vectorized version of the function instead of the normal one. This logic can be controlled via a feature flag so if we need to revert to the old row-based evaluation in production, we can do so.

After a minimally viable end-to-end version of the feature is complete, we can move on to vectorizing more kinds operations. At this point we will want to add a vector_repeat semantic graph node that would allow expressions like r.a + 1 to be executed as r.a + vector_repeat(1).

The advantage of this design is that it paves the way for vectorizing lots Flux programs, not just the lambdas evalauted by map, and it also avoids making a another trip into Rust code at runtime. No changes to the Go/Rust interface will be required.

A previous iteration of this design is here (note that much of this description is out of date):
#3993 (comment)

Quest Tracking:

The text was updated successfully, but these errors were encountered:

onelson · 2022-03-22T22:00:05Z

Did some research work in #4580 to determine where the biggest impact might be for next steps (building on the earlier work for map(), ref: #4186).

Vectorization of if/then/else sounded to be higher complexity than many other items in the list, and is a prerequisite for comparison ops (eq, neq, lt, and so on).

Arithmetic operators seem to have big impact while not necessarily depending on any other item being optimized first. For example, it's very common for map() to add new fields derived from other fields using arithmetic.

New issues will be filed for each of add, subtract, multiply, divide and we can tackle them as appetite allows.

github-actions · 2023-08-13T01:37:13Z

This issue has had no recent activity and will be closed soon.

wolffcm added team/query Epic labels Sep 10, 2021

wolffcm mentioned this issue Sep 10, 2021

[Timebox] POC: semantic graph vectorize transformation #3993

Closed

This was referenced Oct 26, 2021

Create a vector_repeat operator in semantic graph #4031

Closed

Add a vectorized field to semantic graph FunctionExpr #4183

Closed

Add a vectorization pass to analysis in Rust #4184

Closed

nathanielc changed the title ~~Vectorize execution of map, filter and other Flux functions~~ EPIC: Vectorize execution of map, filter and other Flux functions Jan 10, 2022

nathanielc assigned onelson Mar 14, 2022

onelson mentioned this issue Mar 21, 2022

Create Demo for Vectorize Map #4573

Closed

onelson mentioned this issue Mar 23, 2022

Figure out next areas to vectorize #4580

Closed

github-actions bot added the no-issue-activity label Aug 13, 2023

github-actions bot closed this as completed Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Vectorize execution of map, filter and other Flux functions #4028

EPIC: Vectorize execution of map, filter and other Flux functions #4028

wolffcm commented Sep 10, 2021 •

edited by onelson

Loading

onelson commented Mar 22, 2022 •

edited

Loading

github-actions bot commented Aug 13, 2023

EPIC: Vectorize execution of map, filter and other Flux functions #4028

EPIC: Vectorize execution of map, filter and other Flux functions #4028

Comments

wolffcm commented Sep 10, 2021 • edited by onelson Loading

onelson commented Mar 22, 2022 • edited Loading

github-actions bot commented Aug 13, 2023

wolffcm commented Sep 10, 2021 •

edited by onelson

Loading

onelson commented Mar 22, 2022 •

edited

Loading