-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0 Datafusion with late materialization #414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's unfinished so I tried to leave high level comments only
b4fb99c
to
f17cb70
Compare
Right now we don't run the filters on compressed data which would probably be the thing to fix. Anyway, this seems fixable |
I agree. I'm going to address the last few comments of your original review and then convert this to "Ready for review" |
e5d82a8
to
8cd7f8f
Compare
994e73b
to
df06c5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some style nits to avoid clones. Should figure out if we really need to validate filter column references
vortex-datafusion/src/lib.rs
Outdated
let filter_exprs: Option<Vec<Expr>> = if filters.is_empty() { | ||
None | ||
} else { | ||
Some(filters.to_vec()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my earlier comment meant that you can get rid of this to_vec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
vortex-datafusion/src/lib.rs
Outdated
@@ -261,7 +483,7 @@ impl ExecutionPlan for VortexMemoryExec { | |||
self.array.clone() | |||
}; | |||
|
|||
Self::execute_single_chunk(chunk, &self.projection, context) | |||
execute_unfiltered(&chunk, &self.scan_projection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this change, you immediately clone the chunk in the function. How about we also change the scan_projection to &[usize]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
vortex-datafusion/src/lib.rs
Outdated
/// Check if the given expression tree can be pushed down into the scan. | ||
fn can_be_pushed_down(expr: &Expr, schema_columns: &HashSet<String>) -> DFResult<bool> { | ||
// If the filter references a column not known to our schema, we reject the filter for pushdown. | ||
// TODO(aduffy): is this necessary? Under what conditions would this happen? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the docs I don't think this check is necessary. TableProvider returns the schema so datafusion will know if the query is valid or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just removed it, if we end up needing to add it back later for some reason we can reference this PR
Unfinished, just opening this as I continue to get things working.
This PR augments the original Vortex connection for Datafusion, with an implementation of filter pushdown that allows us to perform late materialization on as many columns as possible.
Pushdown support will be able to get flagged on/off so we can run benchmarks testing different strategies.
I'm hoping to have an initial version of this with a benchmark harness tonight.