-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31334][SQL] Don't ResolveReference/ResolveMissingReference when Filter condition with aggregate expression #28107
Conversation
thanks for reporting this bug! It's fragile to rely on rule order, can we fix it more completely? My proposal: Do not resolve |
Yea, I know it's fragile to change order, but don't have other good ideal, Thanks for you suggestions, I will test follow your suggestion. |
Changed |
ok to test |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
@@ -1391,11 +1391,30 @@ class Analyzer( | |||
notMatchedActions = newNotMatchedActions) | |||
} | |||
|
|||
case f @ Filter(cond, _) if containsAggregate(cond) => f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a blank line between cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a blank line between cases.
Done
q.mapExpressions(resolveExpressionTopDown(_, q)) | ||
} | ||
|
||
def containsAggregate(e: Expression): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we reuse ResolveAggregateFunctions.containsAggregate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we reuse
ResolveAggregateFunctions.containsAggregate
?
Since here function is still UnresolvedFunction, we can't just reuse this.
Test build #120779 has finished for PR 28107 at commit
|
Test build #120778 has finished for PR 28107 at commit
|
Test build #120781 has finished for PR 28107 at commit
|
Test build #120776 has finished for PR 28107 at commit
|
Test build #120792 has finished for PR 28107 at commit
|
Test build #120798 has finished for PR 28107 at commit
|
For this test, failed because filter's cond 's expr is unresolved so lookup registered udf failed |
Test build #120799 has finished for PR 28107 at commit
|
@cloud-fan These makes future job more complex, maybe when beginning, it's better to use a new Class present havingClause and convert it to Filter in Analyzer after handing it. |
Test build #120803 has finished for PR 28107 at commit
|
(6, 9) | ||
).toDF("a", "b").createOrReplaceTempView("testData1") | ||
|
||
checkAnswer(sql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this test fail before your patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this test fail before your patch?
No, it's won't failed, here is for contrast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this test is not qualified to reproduce the bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this test is not qualified to reproduce the bug?
The first SQL is used for comparison, and the second can reproduce bugs.
If don't need, we can just delete first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's delete
Done
Test build #120868 has finished for PR 28107 at commit
|
ping @cloud-fan any more need to update? |
This PR exposes a more serious problem, that we rely on rules order to resolve HAVING. We can easily get correctness issues because of it. #28294 proposes a new solution to fix this problem completely, can you take a look? |
The pr you mentioned is similar like I have said in #28107 (comment) It's better to to like this, I will notice that pr and check if that pr will solve this pr's problem, if not, I will change based on his pr. |
@cloud-fan |
What changes were proposed in this pull request?
As I have show in https://issues.apache.org/jira/browse/SPARK-31334 's description, same type of sql, when one column type is different as string, catalyst can't analyze it right.
For test sql
When analyze having clause's condition by ResolveAggregateFunctions
Since
a
is StringType type and then aggregation'sagg
expression is unresolved (Because Sum. checkInputDataTypes() need NumericType, buta
is StringType), so ResolveAggregateFunctions won't make a change on above LogicalPlan, thensum(a)
in Filter condition will be resolved in ResolveReference and this a will be resolved as aggregation's output column a#184 , then error happened .Why are the changes needed?
Fix bug in analyzer
Does this PR introduce any user-facing change?
NO
How was this patch tested?
Added UT