[SPARK-31958][SQL] normalize special floating numbers in subquery #28785

cloud-fan · 2020-06-10T15:24:36Z

What changes were proposed in this pull request?

This is a followup of #23388 .

#23388 has an issue: it doesn't handle subquery expressions and assumes they will be turned into joins. However, this is not true for non-correlated subquery expressions.

This PR fixes this issue. It now doesn't skip Subquery, and subquery expressions will be handled by OptimizeSubqueries, which runs the optimizer with the subquery.

Note that, correlated subquery expressions will be handled twice: once in OptimizeSubqueries, once later when it becomes join. This is OK as NormalizeFloatingNumbers is idempotent now.

Why are the changes needed?

fix a bug

Does this PR introduce any user-facing change?

yes, see the newly added test.

How was this patch tested?

new test

cloud-fan · 2020-06-10T15:25:49Z

cc @hvanhovell @maropu @viirya

maropu · 2020-06-10T16:07:38Z

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala

@@ -56,10 +56,6 @@ import org.apache.spark.sql.types._
 object NormalizeFloatingNumbers extends Rule[LogicalPlan] {

  def apply(plan: LogicalPlan): LogicalPlan = plan match {
-    // A subquery will be rewritten into join later, and will go through this rule
-    // eventually. Here we skip subquery, as we only need to run this rule once.
-    case _: Subquery => plan


How about adding tests for the subquery case in NormalizeFloatingPointNumbersSuite, too?

No we can't.

This fix relies on the rule OptimizeSubqueries, which is an inner object of the class Optimizer as it needs to rerun the entire optimizer for subquery. So we can't use OptimizeSubqueries in NormalizeFloatingPointNumbersSuite.

SparkQA · 2020-06-10T21:16:43Z

Test build #123771 has finished for PR 28785 at commit 4dc6413.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-06-11T06:27:58Z

...talyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala

@@ -56,10 +56,6 @@ import org.apache.spark.sql.types._
 object NormalizeFloatingNumbers extends Rule[LogicalPlan] {


Does it also mean This batch must be executed after the RewriteSubquery batch, which creates joins. is not definitely true now?

it's still true, the correlated subquery becomes join, and may have new join keys.

I see. Makes sense.

cloud-fan · 2020-06-11T06:38:56Z

thanks for the review, merging to master/3.0!

### What changes were proposed in this pull request? This is a followup of #23388 . #23388 has an issue: it doesn't handle subquery expressions and assumes they will be turned into joins. However, this is not true for non-correlated subquery expressions. This PR fixes this issue. It now doesn't skip `Subquery`, and subquery expressions will be handled by `OptimizeSubqueries`, which runs the optimizer with the subquery. Note that, correlated subquery expressions will be handled twice: once in `OptimizeSubqueries`, once later when it becomes join. This is OK as `NormalizeFloatingNumbers` is idempotent now. ### Why are the changes needed? fix a bug ### Does this PR introduce _any_ user-facing change? yes, see the newly added test. ### How was this patch tested? new test Closes #28785 from cloud-fan/normalize. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6fb9c80) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? This is a followup of apache#23388 . apache#23388 has an issue: it doesn't handle subquery expressions and assumes they will be turned into joins. However, this is not true for non-correlated subquery expressions. This PR fixes this issue. It now doesn't skip `Subquery`, and subquery expressions will be handled by `OptimizeSubqueries`, which runs the optimizer with the subquery. Note that, correlated subquery expressions will be handled twice: once in `OptimizeSubqueries`, once later when it becomes join. This is OK as `NormalizeFloatingNumbers` is idempotent now. ### Why are the changes needed? fix a bug ### Does this PR introduce _any_ user-facing change? yes, see the newly added test. ### How was this patch tested? new test Closes apache#28785 from cloud-fan/normalize. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6fb9c80) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

normalize special floating numbers in subquery

4dc6413

probot-autolabeler bot added the SQL label Jun 10, 2020

maropu reviewed Jun 10, 2020

View reviewed changes

viirya approved these changes Jun 11, 2020

View reviewed changes

viirya reviewed Jun 11, 2020

View reviewed changes

cloud-fan closed this in 6fb9c80 Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31958][SQL] normalize special floating numbers in subquery #28785

[SPARK-31958][SQL] normalize special floating numbers in subquery #28785

cloud-fan commented Jun 10, 2020

cloud-fan commented Jun 10, 2020

maropu Jun 10, 2020

cloud-fan Jun 11, 2020

SparkQA commented Jun 10, 2020

viirya Jun 11, 2020

cloud-fan Jun 11, 2020 •

edited

Loading

viirya Jun 11, 2020

cloud-fan commented Jun 11, 2020

		@@ -56,10 +56,6 @@ import org.apache.spark.sql.types._
		object NormalizeFloatingNumbers extends Rule[LogicalPlan] {

[SPARK-31958][SQL] normalize special floating numbers in subquery #28785

[SPARK-31958][SQL] normalize special floating numbers in subquery #28785

Conversation

cloud-fan commented Jun 10, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

cloud-fan commented Jun 10, 2020

maropu Jun 10, 2020

Choose a reason for hiding this comment

cloud-fan Jun 11, 2020

Choose a reason for hiding this comment

SparkQA commented Jun 10, 2020

viirya Jun 11, 2020

Choose a reason for hiding this comment

cloud-fan Jun 11, 2020 • edited Loading

Choose a reason for hiding this comment

viirya Jun 11, 2020

Choose a reason for hiding this comment

cloud-fan commented Jun 11, 2020

cloud-fan Jun 11, 2020 •

edited

Loading