-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project #26978
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PR description:
to prune necessary -> to prune unnecessary
Thanks! @MaxGekk |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Outdated
Show resolved
Hide resolved
Test build #115649 has finished for PR 26978 at commit
|
btw, is there any reason not to support the other project-like plans (e.g., aggregate) for nested column pruning? |
@maropu I think because nested column pruning is new feature, so some supports are not done yet. We can add more supports later. I thought about it before, but haven't worked on it yet. |
ah, ok. thanks for the info. |
Test build #115791 has finished for PR 26978 at commit
|
Shall we hold on this PR a little bit until the bug of #24637 is identified and resolved? |
@dongjoon-hyun Yes, I do think so too. Let's see if we can have more details from @cloud-fan. |
seem like a merge conflict when we sync with upstream, please go ahead and don't get blocked by me. |
Oh. Thank you for updating, @cloud-fan ! |
Thanks @cloud-fan! |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
Show resolved
Hide resolved
Test build #116670 has finished for PR 26978 at commit
|
Test build #116690 has started for PR 26978 at commit |
retest this please |
Test build #116719 has finished for PR 26978 at commit
|
Also, cc @dbtsai |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Show resolved
Hide resolved
Test build #116968 has finished for PR 26978 at commit
|
Test build #117378 has finished for PR 26978 at commit
|
@@ -301,6 +301,38 @@ abstract class SchemaPruningSuite | |||
checkAnswer(query, Row("Y.", 1) :: Row("X.", 1) :: Row(null, 2) :: Row(null, 2) :: Nil) | |||
} | |||
|
|||
testSchemaPruning("select explode of nested field of array of struct") { | |||
// Config combinations | |||
val configs = Seq((true, true), (true, false), (false, true), (false, false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Test build #117381 has finished for PR 26978 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you so much, @viirya , @cloud-fan , @maropu , @MaxGekk .
(cc @gatorsmile and @dbtsai )
Try the above example. @dongjoon-hyun @viirya |
@@ -301,6 +301,38 @@ abstract class SchemaPruningSuite | |||
checkAnswer(query, Row("Y.", 1) :: Row("X.", 1) :: Row(null, 2) :: Row(null, 2) :: Nil) | |||
} | |||
|
|||
testSchemaPruning("select explode of nested field of array of struct") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason why we did not capture the bug is our tests are not well designed and reviewed.
We have to be super careful when we review the tests and then it will be much easier to find the bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching it and pinging me. Let me look at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #27503 to fix it.
…ate without Project This reverts commit a0e63b6. ### What changes were proposed in this pull request? This reverts the patch at #26978 based on gatorsmile's suggestion. ### Why are the changes needed? Original patch #26978 has not considered a corner case. We may need to put more time on ensuring we can cover all cases. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes #27504 from viirya/revert-SPARK-29721. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>
Thank you, @gatorsmile . I'll be more careful. |
…out Project ### What changes were proposed in this pull request? This patch proposes to prune unnecessary nested fields from Generate which has no Project on top of it. ### Why are the changes needed? In Optimizer, we can prune nested columns from Project(projectList, Generate). However, unnecessary columns could still possibly be read in Generate, if no Project on top of it. We should prune it too. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes apache#26978 from viirya/SPARK-29721. Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…ate without Project This reverts commit a0e63b6. ### What changes were proposed in this pull request? This reverts the patch at #26978 based on gatorsmile's suggestion. ### Why are the changes needed? Original patch #26978 has not considered a corner case. We may need to put more time on ensuring we can cover all cases. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes #27504 from viirya/revert-SPARK-29721. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>
…ate without Project This reverts commit a0e63b6. ### What changes were proposed in this pull request? This reverts the patch at apache#26978 based on gatorsmile's suggestion. ### Why are the changes needed? Original patch apache#26978 has not considered a corner case. We may need to put more time on ensuring we can cover all cases. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes apache#27504 from viirya/revert-SPARK-29721. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>
What changes were proposed in this pull request?
This patch proposes to prune unnecessary nested fields from Generate which has no Project on top of it.
Why are the changes needed?
In Optimizer, we can prune nested columns from Project(projectList, Generate). However, unnecessary columns could still possibly be read in Generate, if no Project on top of it. We should prune it too.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test.