Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

analyzer: add rule to prune unnecessary columns #572

Merged
merged 1 commit into from
Dec 12, 2018

Conversation

erizocosmico
Copy link
Contributor

Closes #570

Also fixes a bug we didn't notice that didn't resolve correctly in reorder projection.

@erizocosmico erizocosmico requested a review from a team December 12, 2018 12:45
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
Copy link
Contributor

@ajnavarro ajnavarro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need subqueryBarrier on top of subqueries? why not just apply the rule on the subqueries til the tree that is going out the Analyzer is the same as the tree going in?

@erizocosmico
Copy link
Contributor Author

Since TransformUp goes first from the innermost node to the topmost node all nodes inside the subquery will have been transformed before getting to the SubqueryAlias. Without that node you cannot prune columns inside a subquery, because in the outer query the columns coming from the subquery have the subquery as table, but not inside.

So, if we have:

SELECT t.foo FROM (SELECT t1.foo, t1.bar FROM t1) t

We gather the used columns, which are t.foo. The first node transformed is t1, followed by the project inside the subquery, which will not find t1.foo or t1.bar in the used columns, removing them both.

The subquery barrier is there to prevent this from happening and analysing the subquery correctly as an independent unit having the correct table references in the used columns.

@ajnavarro
Copy link
Contributor

Oh, I see, thanks for the explanation.

Could this be avoided implementing transform downs too?

@erizocosmico
Copy link
Contributor Author

Not only transform downs. It would require a TransformDown and a way to stop transforms, so that when you get to a SubqueryAlias, you can stop the transformation and transform that node and their childrens on your own. That is so because a subquery needs to be transformed with a different set of used columns than the parent.

@ajnavarro ajnavarro merged commit 7beeb8d into src-d:master Dec 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rule to remove unused columns from projections and pushdown
3 participants