-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31958][SQL] normalize special floating numbers in subquery #28785
Conversation
@@ -56,10 +56,6 @@ import org.apache.spark.sql.types._ | |||
object NormalizeFloatingNumbers extends Rule[LogicalPlan] { | |||
|
|||
def apply(plan: LogicalPlan): LogicalPlan = plan match { | |||
// A subquery will be rewritten into join later, and will go through this rule | |||
// eventually. Here we skip subquery, as we only need to run this rule once. | |||
case _: Subquery => plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding tests for the subquery case in NormalizeFloatingPointNumbersSuite
, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we can't.
This fix relies on the rule OptimizeSubqueries
, which is an inner object of the class Optimizer
as it needs to rerun the entire optimizer for subquery. So we can't use OptimizeSubqueries
in NormalizeFloatingPointNumbersSuite
.
Test build #123771 has finished for PR 28785 at commit
|
@@ -56,10 +56,6 @@ import org.apache.spark.sql.types._ | |||
object NormalizeFloatingNumbers extends Rule[LogicalPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also mean This batch must be executed after the
RewriteSubquery batch, which creates joins.
is not definitely true now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still true, the correlated subquery becomes join, and may have new join keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Makes sense.
thanks for the review, merging to master/3.0! |
### What changes were proposed in this pull request? This is a followup of #23388 . #23388 has an issue: it doesn't handle subquery expressions and assumes they will be turned into joins. However, this is not true for non-correlated subquery expressions. This PR fixes this issue. It now doesn't skip `Subquery`, and subquery expressions will be handled by `OptimizeSubqueries`, which runs the optimizer with the subquery. Note that, correlated subquery expressions will be handled twice: once in `OptimizeSubqueries`, once later when it becomes join. This is OK as `NormalizeFloatingNumbers` is idempotent now. ### Why are the changes needed? fix a bug ### Does this PR introduce _any_ user-facing change? yes, see the newly added test. ### How was this patch tested? new test Closes #28785 from cloud-fan/normalize. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6fb9c80) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This is a followup of apache#23388 . apache#23388 has an issue: it doesn't handle subquery expressions and assumes they will be turned into joins. However, this is not true for non-correlated subquery expressions. This PR fixes this issue. It now doesn't skip `Subquery`, and subquery expressions will be handled by `OptimizeSubqueries`, which runs the optimizer with the subquery. Note that, correlated subquery expressions will be handled twice: once in `OptimizeSubqueries`, once later when it becomes join. This is OK as `NormalizeFloatingNumbers` is idempotent now. ### Why are the changes needed? fix a bug ### Does this PR introduce _any_ user-facing change? yes, see the newly added test. ### How was this patch tested? new test Closes apache#28785 from cloud-fan/normalize. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6fb9c80) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This is a followup of #23388 .
#23388 has an issue: it doesn't handle subquery expressions and assumes they will be turned into joins. However, this is not true for non-correlated subquery expressions.
This PR fixes this issue. It now doesn't skip
Subquery
, and subquery expressions will be handled byOptimizeSubqueries
, which runs the optimizer with the subquery.Note that, correlated subquery expressions will be handled twice: once in
OptimizeSubqueries
, once later when it becomes join. This is OK asNormalizeFloatingNumbers
is idempotent now.Why are the changes needed?
fix a bug
Does this PR introduce any user-facing change?
yes, see the newly added test.
How was this patch tested?
new test