-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stdlib): create a version of map that is columnar and supports vectorization #4329
Conversation
2af604f
to
a99745c
Compare
stdlib/universe/map2.go
Outdated
// It will return the new columns after the function was executed. | ||
// These columns are not organized by group keys and may require regrouping | ||
// if a group key column no longer contains consistent values. | ||
func (m *mapTransformation2) execute(chunk table.Chunk, fn *execute.RowMapPreparedFn, mem memory.Allocator) ([]flux.ColMeta, []array.Interface, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the only function that has to get modified for the vectorized version. The return value is exactly what we need and the parts around this all rely on the same output, but this function would need to be updated to also execute the vectorized version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's value in putting this in as an XXX
comment (or TODO
or whatever; I have my preferences, but whatever you're good with).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the function that we have here has a type like (r: {a: v[int], b: v[int]}) => {c: v[int]}
, could we expect that the above code in createSchema
will just work?
Also, can createSchema
be a stand-alone function rather than a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It kind of depends on the type that gets returned. We may have to teach create schema to look at the element type if the function is vectorized.
It would be possible for createSchema
to be a standalone function, but I usually try to scope these by making them method calls so I don't have to worry about name conflicts and using overly generic names.
2941a67
to
348928b
Compare
stdlib/universe/map2.go
Outdated
// It will return the new columns after the function was executed. | ||
// These columns are not organized by group keys and may require regrouping | ||
// if a group key column no longer contains consistent values. | ||
func (m *mapTransformation2) execute(chunk table.Chunk, fn *execute.RowMapPreparedFn, mem memory.Allocator) ([]flux.ColMeta, []array.Interface, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's value in putting this in as an XXX
comment (or TODO
or whatever; I have my preferences, but whatever you're good with).
348928b
to
62eac34
Compare
ee833a8
to
899e919
Compare
899e919
to
3d455fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. I had a few questions that I'd like to discuss before this merges.
stdlib/universe/map2.go
Outdated
// if spec.Fn.Fn.Vectorized != nil { | ||
// fn = &mapVectorFunc{ | ||
// fn: execute.NewVectorMapFn( | ||
// spec.Fn.Fn.Vectorized, | ||
// compiler.ToScope(spec.Fn.Scope), | ||
// ), | ||
// } | ||
// } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an important check. Is there a reason we're skipping it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops.
@@ -206,7 +214,7 @@ func NewTablePredicateFn(fn *semantic.FunctionExpression, scope compiler.Scope) | |||
} | |||
|
|||
func (f *TablePredicateFn) Prepare(tbl flux.Table) (*TablePredicatePreparedFn, error) { | |||
fn, err := f.prepare(tbl.Key().Cols(), nil) | |||
fn, err := f.prepare(tbl.Key().Cols(), nil, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about why we're passing bool literals as arguments to prepare
. I thought we would be checking whether the function expression has a non-nil value for the Vectorized
field, and setting the argument for vectorized
based on that check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is after the check for vectorization. We've now either passed the vectorized function or not. It's mostly because, if it's vectorized, then the input columns need to be wrapped in vector types. Rather than expose that code to the map transformation, I chose to just add a boolean to this private helper struct.
3d455fd
to
d9773fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some comments and questions, but nothing to keep you from merging.
Really nice to see this come together!
@@ -81,6 +81,7 @@ t_meta_query_keys = (table=<-) => { | |||
|> range(start: 2018-05-22T19:53:26Z) | |||
|> filter(fn: (r) => r._measurement == "cpu") | |||
|> keys() | |||
|> sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this had to be added, can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old map ordered the columns by using the sorted properties. This isn't necessarily correct. It's really supposed to be the order that the type system tells us. The new map used what the type system tells us about the order. These tests produce values depending on the column order which, in my opinion, should be treated as non-deterministic since the only reason why the order differed was because a map was inserted.
stdlib/universe/map2.gen.go
Outdated
} | ||
return true | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Can these be stand-alone functions instead of methods of mapTransformation2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could, but this is a very large package and I didn't want to add free functions to it. Maybe I should refactor this to arrowutil
so it's in a common location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Map the return object to the expected order from type inference. | ||
// The compiler should have done this by itself, but it doesn't at the moment. | ||
// When the compiler gets refactored so it returns records in the same order | ||
// as type inference, we can remove this and just do a copy by index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to update the compiler in this PR if it's not hard to do? Otherwise, can you link to an issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably be too difficult. I can make an issue, but my thought is that it's not really worth doing except as part of a larger overhaul of the compiler runtime.
d9773fd
to
6585322
Compare
…ectorization The columnar version of map will run map over an entire table chunk rather than running it and then appending per row. This version of map should be more efficient in the normal cases. In addition, it also supports running vectorized functions when vectorization is available.
6585322
to
2c1ffa5
Compare
The columnar version of map will run map over an entire table chunk
rather than running it and then appending per row. This version of map
should be more efficient in the normal cases. In addition, it also
supports running vectorized functions when vectorization is available.
Fixes #4186.
Done checklist