-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: split column expression into unresolved and resolved types #3804
Open
kevinzwang
wants to merge
9
commits into
main
Choose a base branch
from
kevin/unresolved-col
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CodSpeed Performance ReportMerging #3804 will not alter performanceComparing Summary
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BREAKING CHANGE: Python struct get syntactic sugar is no longer allowed (ex:
col("a.b")
->col("a").struct.get("b")
.I did not think it was worth continuing to maintain what is essentially a small compiler for an undocumented and somewhat unclear feature. I would have to make some significant code changes to get it to work with unresolved columns, and I figured I would just rip it out. In the future I'd like to support something like this instead:
col("a")["b"]
.This PR is the first step to non-equality joins in Daft. We introduce an
UnresolvedColumn
type so that we can resolve it in the logical plan builder into aResolvedColumn
,JoinSideColumn
(for joins), orOuterReferenceColumn
(for subqueries). In addition, we introduce an Alias logical op that is used to determine the sources forUnresolvedColumns
withplan_id
set.Along with this PR is a refactor of our SQL planner. The
Alias
op is now used to encode naming and scoping information, so theRelation
struct is no longer necessary.Additionally, I discovered that we can significantly simplify
SQLPlanner::plan_non_agg_query
andSQLPlanner::plan_aggregate_query
. I believe the prior complexity came due to the fact that in many clauses (GROUP BY, ORDER BY, etc), you can reference both columns in the input tables as well as aliases in the select clause. However, the planner'sbound_columns
already solves this issue.This may yield slightly less performant plans due to redundant expressions, but that should be handled by the optimizer instead of the SQL planner. Moreover, I was able to remove a sort from
plan_aggregate_query
, which may yield both performance and correctness improvements, since the prior code assumed you can split a sort up into two, however that may lead to different results due to sort orders (ORDER BY a, b
!=ORDER BY b, a
) as well as whether our sort impl is stable or unstable.