feat(stdlib): create a version of map that is columnar and supports vectorization #4329

jsternberg · 2021-12-08T19:00:42Z

The columnar version of map will run map over an entire table chunk
rather than running it and then appending per row. This version of map
should be more efficient in the normal cases. In addition, it also
supports running vectorized functions when vectorization is available.

Fixes #4186.

Done checklist

docs/SPEC.md updated
Test cases written

jsternberg · 2021-12-08T19:09:52Z

stdlib/universe/map2.go

+// It will return the new columns after the function was executed.
+// These columns are not organized by group keys and may require regrouping
+// if a group key column no longer contains consistent values.
+func (m *mapTransformation2) execute(chunk table.Chunk, fn *execute.RowMapPreparedFn, mem memory.Allocator) ([]flux.ColMeta, []array.Interface, error) {


This should be the only function that has to get modified for the vectorized version. The return value is exactly what we need and the parts around this all rely on the same output, but this function would need to be updated to also execute the vectorized version.

I wonder if there's value in putting this in as an XXX comment (or TODO or whatever; I have my preferences, but whatever you're good with).

So if the function that we have here has a type like (r: {a: v[int], b: v[int]}) => {c: v[int]}, could we expect that the above code in createSchema will just work?

Also, can createSchema be a stand-alone function rather than a method?

It kind of depends on the type that gets returned. We may have to teach create schema to look at the element type if the function is vectorized.

It would be possible for createSchema to be a standalone function, but I usually try to scope these by making them method calls so I don't have to worry about name conflicts and using overly generic names.

rockstar · 2021-12-08T19:13:58Z

stdlib/universe/map2.go

+// It will return the new columns after the function was executed.
+// These columns are not organized by group keys and may require regrouping
+// if a group key column no longer contains consistent values.
+func (m *mapTransformation2) execute(chunk table.Chunk, fn *execute.RowMapPreparedFn, mem memory.Allocator) ([]flux.ColMeta, []array.Interface, error) {


I wonder if there's value in putting this in as an XXX comment (or TODO or whatever; I have my preferences, but whatever you're good with).

scbrickley

Overall this looks good. I had a few questions that I'd like to discuss before this merges.

scbrickley · 2022-02-08T20:11:38Z

stdlib/universe/map2.go

+	// if spec.Fn.Fn.Vectorized != nil {
+	// 	fn = &mapVectorFunc{
+	// 		fn: execute.NewVectorMapFn(
+	// 			spec.Fn.Fn.Vectorized,
+	// 			compiler.ToScope(spec.Fn.Scope),
+	// 		),
+	// 	}
+	// } else {


This seems like an important check. Is there a reason we're skipping it?

scbrickley · 2022-02-08T20:21:37Z

execute/row_fn.go

@@ -206,7 +214,7 @@ func NewTablePredicateFn(fn *semantic.FunctionExpression, scope compiler.Scope)
 }

 func (f *TablePredicateFn) Prepare(tbl flux.Table) (*TablePredicatePreparedFn, error) {
-	fn, err := f.prepare(tbl.Key().Cols(), nil)
+	fn, err := f.prepare(tbl.Key().Cols(), nil, false)


I'm confused about why we're passing bool literals as arguments to prepare. I thought we would be checking whether the function expression has a non-nil value for the Vectorized field, and setting the argument for vectorized based on that check.

This is after the check for vectorization. We've now either passed the vectorized function or not. It's mostly because, if it's vectorized, then the input columns need to be wrapped in vector types. Rather than expose that code to the map transformation, I chose to just add a boolean to this private helper struct.

wolffcm

I had some comments and questions, but nothing to keep you from merging.

Really nice to see this come together!

wolffcm · 2022-02-08T20:11:34Z

stdlib/universe/meta_query_keys_test.flux

@@ -81,6 +81,7 @@ t_meta_query_keys = (table=<-) => {
            |> range(start: 2018-05-22T19:53:26Z)
            |> filter(fn: (r) => r._measurement == "cpu")
            |> keys()
+            |> sort()


I'm curious why this had to be added, can you explain?

The old map ordered the columns by using the sorted properties. This isn't necessarily correct. It's really supposed to be the order that the type system tells us. The new map used what the type system tells us about the order. These tests produce values depending on the column order which, in my opinion, should be treated as non-deterministic since the only reason why the order differed was because a map was inserted.

wolffcm · 2022-02-08T20:21:45Z

stdlib/universe/map2.gen.go

+	}
+	return true
+
+}


nit Can these be stand-alone functions instead of methods of mapTransformation2?

They could, but this is a very large package and I didn't want to add free functions to it. Maybe I should refactor this to arrowutil so it's in a common location?

wolffcm · 2022-02-08T20:24:35Z

execute/vector_fn.go

+	// Map the return object to the expected order from type inference.
+	// The compiler should have done this by itself, but it doesn't at the moment.
+	// When the compiler gets refactored so it returns records in the same order
+	// as type inference, we can remove this and just do a copy by index.


Would it make sense to update the compiler in this PR if it's not hard to do? Otherwise, can you link to an issue here?

This would probably be too difficult. I can make an issue, but my thought is that it's not really worth doing except as part of a larger overhaul of the compiler runtime.

…ectorization The columnar version of map will run map over an entire table chunk rather than running it and then appending per row. This version of map should be more efficient in the normal cases. In addition, it also supports running vectorized functions when vectorization is available.

jsternberg force-pushed the feat/vectorized-map branch 3 times, most recently from 2af604f to a99745c Compare December 8, 2021 19:04

jsternberg commented Dec 8, 2021

View reviewed changes

jsternberg requested review from a team and rockstar and removed request for a team December 8, 2021 19:09

jsternberg force-pushed the feat/vectorized-map branch 2 times, most recently from 2941a67 to 348928b Compare December 13, 2021 21:58

rockstar approved these changes Dec 14, 2021

View reviewed changes

jsternberg force-pushed the feat/vectorized-map branch from 348928b to 62eac34 Compare December 14, 2021 17:18

scbrickley mentioned this pull request Jan 14, 2022

feat: Compiler supports vectorized functions #4356

Closed

jsternberg force-pushed the feat/vectorized-map branch 3 times, most recently from ee833a8 to 899e919 Compare February 8, 2022 19:48

jsternberg marked this pull request as ready for review February 8, 2022 19:48

jsternberg requested review from a team as code owners February 8, 2022 19:48

jsternberg requested review from noramullen1, wolffcm and scbrickley and removed request for a team February 8, 2022 19:48

jsternberg force-pushed the feat/vectorized-map branch from 899e919 to 3d455fd Compare February 8, 2022 19:50

jsternberg changed the title ~~feat(stdlib): create a version of map that is columnar~~ feat(stdlib): create a version of map that is columnar and supports vectorization Feb 8, 2022

scbrickley reviewed Feb 8, 2022

View reviewed changes

jsternberg force-pushed the feat/vectorized-map branch from 3d455fd to d9773fd Compare February 8, 2022 20:32

scbrickley approved these changes Feb 8, 2022

View reviewed changes

wolffcm approved these changes Feb 8, 2022

View reviewed changes

jsternberg force-pushed the feat/vectorized-map branch from d9773fd to 6585322 Compare February 8, 2022 22:20

jsternberg force-pushed the feat/vectorized-map branch from 6585322 to 2c1ffa5 Compare February 8, 2022 22:51

jsternberg merged commit 72b7382 into master Feb 9, 2022

jsternberg deleted the feat/vectorized-map branch February 9, 2022 14:26

onelson mentioned this pull request Mar 22, 2022

EPIC: Vectorize execution of map, filter and other Flux functions #4028

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stdlib): create a version of map that is columnar and supports vectorization #4329

feat(stdlib): create a version of map that is columnar and supports vectorization #4329

jsternberg commented Dec 8, 2021 •

edited

Loading

jsternberg Dec 8, 2021

rockstar Dec 8, 2021

wolffcm Dec 9, 2021

jsternberg Dec 9, 2021

rockstar Dec 8, 2021

scbrickley left a comment

scbrickley Feb 8, 2022

jsternberg Feb 8, 2022

scbrickley Feb 8, 2022

jsternberg Feb 8, 2022

wolffcm left a comment

wolffcm Feb 8, 2022

jsternberg Feb 8, 2022

wolffcm Feb 8, 2022

jsternberg Feb 8, 2022

jsternberg Feb 8, 2022

wolffcm Feb 8, 2022

jsternberg Feb 8, 2022

feat(stdlib): create a version of map that is columnar and supports vectorization #4329

feat(stdlib): create a version of map that is columnar and supports vectorization #4329

Conversation

jsternberg commented Dec 8, 2021 • edited Loading

Done checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scbrickley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wolffcm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsternberg commented Dec 8, 2021 •

edited

Loading