[SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery #48627

bersprockets · 2024-10-23T19:22:08Z

What changes were proposed in this pull request?

This PR adds code to RewritePredicateSubquery#apply to explicitly handle the case where an Aggregate node contains an aggregate expression in the left-hand operand of an IN-subquery expression. The explicit handler moves the IN-subquery expressions out of the Aggregate and into a parent Project node. The Aggregate will continue to perform the aggregations that were used as an operand to the IN-subquery expression, but will not include the IN-subquery expression itself. After pulling up IN-subquery expressions into a Project node, RewritePredicateSubquery#apply is called again to handle the Project as a UnaryNode. The Join will now be inserted between the Project and the Aggregate node, and the join condition will use an attribute rather than an aggregate expression, e.g.:

Project [col1#32, exists#42 AS (sum(col2) IN (listquery()))#40]
+- Join ExistenceJoin(exists#42), (sum(col2)#41L = c2#39L)
   :- Aggregate [col1#32], [col1#32, sum(col2#33) AS sum(col2)#41L]
   :  +- LocalRelation [col1#32, col2#33]
   +- LocalRelation [c2#39L]

sum(col2)#41L in the above join condition, despite how it looks, is the name of the attribute, not an aggregate expression.

Why are the changes needed?

The following query fails:

create or replace temp view v1(c1, c2) as values (1, 2), (1, 3), (2, 2), (3, 7), (3, 1);
create or replace temp view v2(col1, col2) as values (1, 2), (1, 3), (2, 2), (3, 7), (3, 1);

select col1, sum(col2) in (select c2 from v1)
from v2 group by col1;

It fails with this error:

[INTERNAL_ERROR] Cannot generate code for expression: sum(input[1, int, false]) SQLSTATE: XX000

With SPARK_TESTING=1, it fails with this error:

[PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery in batch RewriteSubquery generated an invalid plan: Special expressions are placed in the wrong plan:
Aggregate [col1#11], [col1#11, first(exists#20, false) AS (sum(col2) IN (listquery()))#19]
+- Join ExistenceJoin(exists#20), (sum(col2#12) = c2#18L)
   :- LocalRelation [col1#11, col2#12]
   +- LocalRelation [c2#18L]

The issue is that RewritePredicateSubquery builds a Join operator where the join condition contains an aggregate expression.

The bug is in the handler for UnaryNode in RewritePredicateSubquery#apply, which adds a Join below the Aggregate and assumes that the left-hand operand of IN-subquery can be used in the join condition. This works fine for most cases, but not when the left-hand operand is an aggregate expression.

This PR moves the offending IN-subqueries to a Project node, with the aggregates replaced by attributes referring to the aggregate expressions. The resulting join condition now uses those attributes rather than the actual aggregate expressions.

Does this PR introduce any user-facing change?

No, other than allowing this type of query to succeed.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

bersprockets · 2024-11-15T15:38:47Z

cc @cloud-fan

dtenedor

Thanks for the fix!!

dtenedor · 2024-11-21T00:17:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

@@ -245,6 +266,55 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
            condition = Some(newCondition)))
        }
      }
+    case a: Aggregate if exprsContainsAggregateInSubquery(a.aggregateExpressions) =>


This file is already over 1000 lines long, can we move this logic to a helper object in another file to improve the code health?

@dtenedor

I could move the entire new handler into a helper function in another file.

On the other hand, this file contains 6 rules, all related to subqueries in one way or another. They could be split up (in a separate refactor, not by this PR).

dtenedor · 2024-11-21T00:18:19Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

@@ -245,6 +266,55 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper {
            condition = Some(newCondition)))
        }
      }
+    case a: Aggregate if exprsContainsAggregateInSubquery(a.aggregateExpressions) =>
+      // find expressions with an IN-subquery whose left-hand operand contains aggregates


please express the comments as full sentences (imperative is OK) starting with capital letters and ending in punctuation.

dtenedor · 2024-11-21T00:19:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

+      }
+
+      val inSubqueryMap = inSubqueryMapping.toMap
+      // get all aggregate expressions found in left-hand operands of IN-subqueries


It's a bit hard to follow this logic in the code. Can you add a comment with a brief example, showing the query plan and the steps performed here?

dtenedor · 2024-11-21T00:19:55Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

+        case ae: Expression if inSubqueryMap.contains(ae) =>
+          // replace the expression with an aliased aggregate expression
+          inSubqueryMap(ae).map(aggregateExprAliasMap(_))
+        case ae @ _ => Seq(ae)


Suggested change

case ae @ _ => Seq(ae)

case ae => Seq(ae)

dtenedor · 2024-11-21T00:20:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

+            // patch any aggregate expression with its corresponding attribute
+            case a: AggregateExpression => aggregateExprAttrMap(a)
+          }.asInstanceOf[NamedExpression]
+        case ae @ _ => ae.toAttribute


Suggested change

case ae @ _ => ae.toAttribute

case ae => ae.toAttribute

cloud-fan · 2024-11-21T13:49:57Z

cc @agubichev @andylam-db

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

attilapiros

LGTM (just a tiny typo in the comments) but let's wait for a committer who is more familiar in this area

attilapiros · 2024-11-26T22:14:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

+    //   +- LocalRelation [col1#28, col2#29]
+    //
+    // Note that the Aggregate node contains the IN-subquery and the left-hand
+    // side of the IN-subquery is an aggregate expression (sum(col2#28)).


Suggested change

// side of the IN-subquery is an aggregate expression (sum(col2#28)).

// side of the IN-subquery is an aggregate expression (sum(col2#29)).

attilapiros · 2024-11-27T01:13:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

+    //
+    // The transformation pulled the IN-subquery up into a Project. The left-hand side of the
+    // IN-subquery is now an attribute (sum(col2)#36L) that refers to the actual aggregation
+    // which is still performed in the Aggregate node (sum(col2#28) AS sum(col2)#36L). The Unary


Suggested change

// which is still performed in the Aggregate node (sum(col2#28) AS sum(col2)#36L). The Unary

// which is still performed in the Aggregate node (sum(col2#29) AS sum(col2)#36L). The Unary

Good catch!

cloud-fan · 2025-01-03T04:51:59Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

+
+      // Reapply this rule, but now with all interesting expressions
+      // from Aggregate.aggregateExpressions pulled up into a Project node.
+      apply(newProj)


This reminds me of the rule RewriteWithExpression, which also needs to rewrite Aggregate first. We should not call apply here in the middle of plan traveral, as apply transforms the plan again, and leads to O(n^2) complexity. Instead of, we should also add a util function that rewrites UnaryNode (not transforms the full tree) and call it here and the original case match for UnaryNode.

cloud-fan · 2025-01-06T04:11:47Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

+      // withInSubquery will be a List containing a single Alias expression:
+      //
+      //   List(sum(col2#12) IN (list#8 []) AS (...)#19)
+      val withInSubquery = a.aggregateExpressions.filter(exprContainsAggregateInSubquery(_))


Once we detect such InSubquery, I think it's much simpler to normalize the Aggregate node to pull up the full result projection to a new Project node, instead of only rewriting the problematic InSubquery. This is also how RewriteWithExpression does it and the code is much simpler and less error-prone. We can even create a util function to reuse the code in RewriteWithExpression.

BTW I think it's better to always build the query plan tree with this normalized form (Aggregate should only do grouping and aggregating, projection should always happen in Project), but this is a much bigger topic.

I am delayed in responding to review comments: I not around my laptop much until next week.

I didn't create a util function because the PhysicalAggregation extractor does almost all the heavy lifting and the version of the code in RewriteWithExpression called applyInternal on the new Aggregate node before making it a child of the new Project node.

bersprockets · 2025-01-21T05:06:53Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala

@@ -79,4 +80,20 @@ class RewriteSubquerySuite extends PlanTest {
    Optimize.executeAndTrack(query.analyze, tracker)
    assert(tracker.rules(RewritePredicateSubquery.ruleName).numEffectiveInvocations == 0)
  }
+
+  test("SPARK-50091: Don't put aggregate expression in join condition") {


I also updated this test to check the whole optimized plan rather than simply testing that the join condition does not have an aggregate expression.

cloud-fan · 2025-01-21T09:53:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

+    // which are still performed in the Aggregate node (sum(col2#18) and sum(col3#19)).
+    case p @ PhysicalAggregation(
+        groupingExpressions, aggregateExpressions, resultExpressions, child)
+        if exprsContainsAggregateInSubquery(p.expressions) =>


Suggested change

if exprsContainsAggregateInSubquery(p.expressions) =>

if exprsContainsAggregateInSubquery(resultExpressions) =>

This rewrite only pulls out subquery expressions for Aggregate#aggregateExpressions, not grouping expressions.

Re: if exprsContainsAggregateInSubquery(resultExpressions) =>.

That won't work withexprsContainsAggregateInSubquery as it currently stands, since that function looks for in-subqueries with aggregate expressions in the left-hand operand. resultExpressions has the aggregate expressions replaced with attributes, so exprsContainsAggregateInSubquery would never trigger.

Alternatively, I could do

if exprsContainsAggregateInSubquery(p.asInstanceOf[Aggregate].aggregateExpressions) =>

which is kind of ugly, but does the trick.

Another alternative: I'm the only one calling exprsContainsAggregateInSubquery, so I could change it to return true if there are any in-subqueries at all with no regard to characteristics of the left-hand operand. We would end up rewriting some cases that wouldn't otherwise cause trouble.

ah OK, let's keep it as it is

cloud-fan · 2025-01-21T09:57:19Z

sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala

@@ -2800,4 +2800,32 @@ class SubquerySuite extends QueryTest
      checkAnswer(df3, Row(7))
    }
  }
+
+  test("SPARK-50091: Handle aggregates in left-hand operand of IN-subquery") {
+    withTable("v1", "v2") {


Suggested change

withTable("v1", "v2") {

withTempView("v1", "v2") {

cloud-fan · 2025-01-21T09:57:57Z

sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala

+  test("SPARK-50091: Handle aggregates in left-hand operand of IN-subquery") {
+    withTable("v1", "v2") {
+      sql("""CREATE OR REPLACE TEMP VIEW v1 (c1, c2, c3) AS VALUES
+            |(1, 2, 2), (1, 5, 3), (2, 0, 4), (3, 7, 7), (3, 8, 8)""".stripMargin)


nit: Seq((1, 2, 2), (1, 5, 3), ...).toDF("c1", "c2", "c3").createTempView

cloud-fan · 2025-01-23T03:02:33Z

thanks, merging to master/4.0!

…IN-subquery ### What changes were proposed in this pull request? This PR adds code to `RewritePredicateSubquery#apply` to explicitly handle the case where an `Aggregate` node contains an aggregate expression in the left-hand operand of an IN-subquery expression. The explicit handler moves the IN-subquery expressions out of the `Aggregate` and into a parent `Project` node. The `Aggregate` will continue to perform the aggregations that were used as an operand to the IN-subquery expression, but will not include the IN-subquery expression itself. After pulling up IN-subquery expressions into a Project node, `RewritePredicateSubquery#apply` is called again to handle the `Project` as a `UnaryNode`. The `Join` will now be inserted between the `Project` and the `Aggregate` node, and the join condition will use an attribute rather than an aggregate expression, e.g.: ``` Project [col1#32, exists#42 AS (sum(col2) IN (listquery()))#40] +- Join ExistenceJoin(exists#42), (sum(col2)#41L = c2#39L) :- Aggregate [col1#32], [col1#32, sum(col2#33) AS sum(col2)#41L] : +- LocalRelation [col1#32, col2#33] +- LocalRelation [c2#39L] ``` `sum(col2)#41L` in the above join condition, despite how it looks, is the name of the attribute, not an aggregate expression. ### Why are the changes needed? The following query fails: ``` create or replace temp view v1(c1, c2) as values (1, 2), (1, 3), (2, 2), (3, 7), (3, 1); create or replace temp view v2(col1, col2) as values (1, 2), (1, 3), (2, 2), (3, 7), (3, 1); select col1, sum(col2) in (select c2 from v1) from v2 group by col1; ``` It fails with this error: ``` [INTERNAL_ERROR] Cannot generate code for expression: sum(input[1, int, false]) SQLSTATE: XX000 ``` With SPARK_TESTING=1, it fails with this error: ``` [PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery in batch RewriteSubquery generated an invalid plan: Special expressions are placed in the wrong plan: Aggregate [col1#11], [col1#11, first(exists#20, false) AS (sum(col2) IN (listquery()))#19] +- Join ExistenceJoin(exists#20), (sum(col2#12) = c2#18L) :- LocalRelation [col1#11, col2#12] +- LocalRelation [c2#18L] ``` The issue is that `RewritePredicateSubquery` builds a `Join` operator where the join condition contains an aggregate expression. The bug is in the handler for `UnaryNode` in `RewritePredicateSubquery#apply`, which adds a `Join` below the `Aggregate` and assumes that the left-hand operand of IN-subquery can be used in the join condition. This works fine for most cases, but not when the left-hand operand is an aggregate expression. This PR moves the offending IN-subqueries to a `Project` node, with the aggregates replaced by attributes referring to the aggregate expressions. The resulting join condition now uses those attributes rather than the actual aggregate expressions. ### Does this PR introduce _any_ user-facing change? No, other than allowing this type of query to succeed. ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48627 from bersprockets/aggregate_in_set_issue. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e02ff1c) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2025-01-23T03:04:39Z

@bersprockets feel free to open a 3.5 backport if it's also an issue there.

…IN-subquery This PR adds code to `RewritePredicateSubquery#apply` to explicitly handle the case where an `Aggregate` node contains an aggregate expression in the left-hand operand of an IN-subquery expression. The explicit handler moves the IN-subquery expressions out of the `Aggregate` and into a parent `Project` node. The `Aggregate` will continue to perform the aggregations that were used as an operand to the IN-subquery expression, but will not include the IN-subquery expression itself. After pulling up IN-subquery expressions into a Project node, `RewritePredicateSubquery#apply` is called again to handle the `Project` as a `UnaryNode`. The `Join` will now be inserted between the `Project` and the `Aggregate` node, and the join condition will use an attribute rather than an aggregate expression, e.g.: ``` Project [col1#32, exists#42 AS (sum(col2) IN (listquery()))apache#40] +- Join ExistenceJoin(exists#42), (sum(col2)#41L = c2#39L) :- Aggregate [col1#32], [col1#32, sum(col2#33) AS sum(col2)#41L] : +- LocalRelation [col1#32, col2#33] +- LocalRelation [c2#39L] ``` `sum(col2)#41L` in the above join condition, despite how it looks, is the name of the attribute, not an aggregate expression. The following query fails: ``` create or replace temp view v1(c1, c2) as values (1, 2), (1, 3), (2, 2), (3, 7), (3, 1); create or replace temp view v2(col1, col2) as values (1, 2), (1, 3), (2, 2), (3, 7), (3, 1); select col1, sum(col2) in (select c2 from v1) from v2 group by col1; ``` It fails with this error: ``` [INTERNAL_ERROR] Cannot generate code for expression: sum(input[1, int, false]) SQLSTATE: XX000 ``` With SPARK_TESTING=1, it fails with this error: ``` [PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery in batch RewriteSubquery generated an invalid plan: Special expressions are placed in the wrong plan: Aggregate [col1#11], [col1#11, first(exists#20, false) AS (sum(col2) IN (listquery()))apache#19] +- Join ExistenceJoin(exists#20), (sum(col2#12) = c2#18L) :- LocalRelation [col1#11, col2#12] +- LocalRelation [c2#18L] ``` The issue is that `RewritePredicateSubquery` builds a `Join` operator where the join condition contains an aggregate expression. The bug is in the handler for `UnaryNode` in `RewritePredicateSubquery#apply`, which adds a `Join` below the `Aggregate` and assumes that the left-hand operand of IN-subquery can be used in the join condition. This works fine for most cases, but not when the left-hand operand is an aggregate expression. This PR moves the offending IN-subqueries to a `Project` node, with the aggregates replaced by attributes referring to the aggregate expressions. The resulting join condition now uses those attributes rather than the actual aggregate expressions. No, other than allowing this type of query to succeed. New unit tests. No. Closes apache#48627 from bersprockets/aggregate_in_set_issue. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…d of IN-subquery ### What changes were proposed in this pull request? This is a back-port of #48627. This PR adds code to `RewritePredicateSubquery#apply` to explicitly handle the case where an `Aggregate` node contains an aggregate expression in the left-hand operand of an IN-subquery expression. The explicit handler moves the IN-subquery expressions out of the `Aggregate` and into a parent `Project` node. The `Aggregate` will continue to perform the aggregations that were used as an operand to the IN-subquery expression, but will not include the IN-subquery expression itself. After pulling up IN-subquery expressions into a Project node, `RewritePredicateSubquery#apply` is called again to handle the `Project` as a `UnaryNode`. The `Join` will now be inserted between the `Project` and the `Aggregate` node, and the join condition will use an attribute rather than an aggregate expression, e.g.: ``` Project [col1#32, exists#42 AS (sum(col2) IN (listquery()))#40] +- Join ExistenceJoin(exists#42), (sum(col2)#41L = c2#39L) :- Aggregate [col1#32], [col1#32, sum(col2#33) AS sum(col2)#41L] : +- LocalRelation [col1#32, col2#33] +- LocalRelation [c2#39L] ``` `sum(col2)#41L` in the above join condition, despite how it looks, is the name of the attribute, not an aggregate expression. ### Why are the changes needed? The following query fails: ``` create or replace temp view v1(c1, c2) as values (1, 2), (1, 3), (2, 2), (3, 7), (3, 1); create or replace temp view v2(col1, col2) as values (1, 2), (1, 3), (2, 2), (3, 7), (3, 1); select col1, sum(col2) in (select c2 from v1) from v2 group by col1; ``` It fails with this error: ``` [INTERNAL_ERROR] Cannot generate code for expression: sum(input[1, int, false]) SQLSTATE: XX000 ``` With SPARK_TESTING=1, it fails with this error: ``` [PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery in batch RewriteSubquery generated an invalid plan: Special expressions are placed in the wrong plan: Aggregate [col1#11], [col1#11, first(exists#20, false) AS (sum(col2) IN (listquery()))#19] +- Join ExistenceJoin(exists#20), (sum(col2#12) = c2#18L) :- LocalRelation [col1#11, col2#12] +- LocalRelation [c2#18L] ``` The issue is that `RewritePredicateSubquery` builds a `Join` operator where the join condition contains an aggregate expression. The bug is in the handler for `UnaryNode` in `RewritePredicateSubquery#apply`, which adds a `Join` below the `Aggregate` and assumes that the left-hand operand of IN-subquery can be used in the join condition. This works fine for most cases, but not when the left-hand operand is an aggregate expression. This PR moves the offending IN-subqueries to a `Project` node, with the aggregates replaced by attributes referring to the aggregate expressions. The resulting join condition now uses those attributes rather than the actual aggregate expressions. ### Does this PR introduce _any_ user-facing change? No, other than allowing this type of query to succeed. ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49663 from bersprockets/aggregate_in_set_issue_br35. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

github-actions bot added the SQL label Oct 23, 2024

bersprockets force-pushed the aggregate_in_set_issue branch from 840748d to b073289 Compare October 29, 2024 17:44

bersprockets force-pushed the aggregate_in_set_issue branch from b073289 to 7328f31 Compare November 6, 2024 21:56

bersprockets force-pushed the aggregate_in_set_issue branch from 7328f31 to 0d31cdb Compare November 15, 2024 02:31

bersprockets force-pushed the aggregate_in_set_issue branch from 0d31cdb to 3319192 Compare November 20, 2024 19:15

dtenedor reviewed Nov 21, 2024

View reviewed changes

attilapiros reviewed Nov 21, 2024

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala Outdated Show resolved Hide resolved

bersprockets commented Nov 24, 2024

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala Show resolved Hide resolved

bersprockets force-pushed the aggregate_in_set_issue branch from a5206a6 to 9e6a688 Compare November 26, 2024 17:21

attilapiros reviewed Nov 27, 2024

View reviewed changes

bersprockets force-pushed the aggregate_in_set_issue branch from ba42748 to ccf7302 Compare December 3, 2024 01:13

bersprockets force-pushed the aggregate_in_set_issue branch from ccf7302 to a9434ea Compare January 1, 2025 22:18

cloud-fan reviewed Jan 3, 2025

View reviewed changes

cloud-fan reviewed Jan 6, 2025

View reviewed changes

bersprockets added 14 commits January 20, 2025 16:14

Some testing

79b5089

update

7fe2a08

Small cleanup

c96af36

Update

424d803

Add catalyst test

2b1a376

Fix names

9c443b0

Clean up some comments

46d43fd

Cleanup

ca4dba8

Rename tests

e0fc82f

Update

3e52a12

Review updates

1db5316

Comment update

f6aa964

Address review comments

cc6384b

Move unary node handler to its own utility method

93d98e7

bersprockets added 2 commits January 20, 2025 16:14

Respond to review comments

cb4066a

Make test more explicit

b5ee466

bersprockets force-pushed the aggregate_in_set_issue branch from a866ebe to b5ee466 Compare January 21, 2025 00:21

bersprockets commented Jan 21, 2025

View reviewed changes

cloud-fan reviewed Jan 21, 2025

View reviewed changes

cloud-fan approved these changes Jan 21, 2025

View reviewed changes

Test updates

0e1c170

cloud-fan closed this in e02ff1c Jan 23, 2025

bersprockets mentioned this pull request Jan 25, 2025

[SPARK-50091][SQL][3.5] Handle case of aggregates in left-hand operand of IN-subquery #49663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery #48627

[SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery #48627

bersprockets commented Oct 23, 2024

bersprockets commented Nov 15, 2024

dtenedor left a comment

dtenedor Nov 21, 2024

bersprockets Nov 24, 2024

dtenedor Nov 21, 2024

dtenedor Nov 21, 2024

dtenedor Nov 21, 2024

dtenedor Nov 21, 2024

cloud-fan commented Nov 21, 2024

attilapiros left a comment •

edited

Loading

attilapiros Nov 26, 2024

attilapiros Nov 27, 2024

bersprockets Nov 27, 2024

cloud-fan Jan 3, 2025

cloud-fan Jan 6, 2025

cloud-fan Jan 6, 2025 •

edited

Loading

bersprockets Jan 10, 2025

bersprockets Jan 21, 2025

bersprockets Jan 21, 2025

cloud-fan Jan 21, 2025

cloud-fan Jan 21, 2025

bersprockets Jan 22, 2025

cloud-fan Jan 22, 2025

cloud-fan Jan 21, 2025

cloud-fan Jan 21, 2025

cloud-fan commented Jan 23, 2025

cloud-fan commented Jan 23, 2025

	// side of the IN-subquery is an aggregate expression (sum(col2#28)).
	// side of the IN-subquery is an aggregate expression (sum(col2#29)).

	// which is still performed in the Aggregate node (sum(col2#28) AS sum(col2)#36L). The Unary
	// which is still performed in the Aggregate node (sum(col2#29) AS sum(col2)#36L). The Unary

	if exprsContainsAggregateInSubquery(p.expressions) =>
	if exprsContainsAggregateInSubquery(resultExpressions) =>

[SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery #48627

[SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery #48627

Conversation

bersprockets commented Oct 23, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

bersprockets commented Nov 15, 2024

dtenedor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Nov 21, 2024

attilapiros left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Jan 23, 2025

cloud-fan commented Jan 23, 2025

attilapiros left a comment •

edited

Loading

cloud-fan Jan 6, 2025 •

edited

Loading