Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Vectorize execution of map, filter and other Flux functions #4028

Closed
wolffcm opened this issue Sep 10, 2021 · 2 comments
Closed

EPIC: Vectorize execution of map, filter and other Flux functions #4028

wolffcm opened this issue Sep 10, 2021 · 2 comments

Comments

@wolffcm
Copy link

wolffcm commented Sep 10, 2021

The purpose of this epic is to realize performance gains from doing operations column-at-a-time instead of row-at-a-time, especially for common Flux functions that evaluate a function like map and filter. Since our internal representation of tables is columnar, this yields a big benefit. A proof of concept that demonstrates these performance gains is described here:
#3867 (comment)

The working design for this feature has changed a bit recently (2021-10-26) to work like this:

During the "normal" analysis phase (parsing to AST, conversion to semantic graph, type inference), perform a vectorization pass on any FunctionExpression whose signature matches the pattern used by map, that is a single parameter called r of Record type. If the pass is successful, a new field in FunctionExpression will be populated with vectorized function (also a FunctionExpression). This vectorization pass can be performed after type inference is complete.

To start with the vectorization pass can just be minimally viable and only work with very simple examples like (r) => ({r with a: r.a}). Over time we can expand the kinds of functions that can be vectorized as we get a better understanding of the problem space and as the capabilities are added to the Compiler.

At runtime, as a call to map is executed, the function's vectorizedFn field can be checked and if it is not nil, then map can opt to execute the vectorized version of the function instead of the normal one. This logic can be controlled via a feature flag so if we need to revert to the old row-based evaluation in production, we can do so.

After a minimally viable end-to-end version of the feature is complete, we can move on to vectorizing more kinds operations. At this point we will want to add a vector_repeat semantic graph node that would allow expressions like r.a + 1 to be executed as r.a + vector_repeat(1).

The advantage of this design is that it paves the way for vectorizing lots Flux programs, not just the lambdas evalauted by map, and it also avoids making a another trip into Rust code at runtime. No changes to the Go/Rust interface will be required.


A previous iteration of this design is here (note that much of this description is out of date):
#3993 (comment)


Quest Tracking:

@nathanielc nathanielc changed the title Vectorize execution of map, filter and other Flux functions EPIC: Vectorize execution of map, filter and other Flux functions Jan 10, 2022
@onelson
Copy link
Contributor

onelson commented Mar 22, 2022

Did some research work in #4580 to determine where the biggest impact might be for next steps (building on the earlier work for map(), ref: #4186).

Vectorization of if/then/else sounded to be higher complexity than many other items in the list, and is a prerequisite for comparison ops (eq, neq, lt, and so on).

Arithmetic operators seem to have big impact while not necessarily depending on any other item being optimized first. For example, it's very common for map() to add new fields derived from other fields using arithmetic.

New issues will be filed for each of add, subtract, multiply, divide and we can tackle them as appetite allows.

@github-actions
Copy link

This issue has had no recent activity and will be closed soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants