-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21726][SQL] Check for structural integrity of the plan in Optimzer in test mode. #18956
Conversation
@@ -37,6 +37,12 @@ import org.apache.spark.sql.types._ | |||
abstract class Optimizer(sessionCatalog: SessionCatalog) | |||
extends RuleExecutor[LogicalPlan] { | |||
|
|||
// Check for structural integrity of the plan in test mode. Currently we only check if a plan is | |||
// still resolved after the execution of each rule. | |||
override protected def planChecker: Option[LogicalPlan => Boolean] = Some( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move the checking of whether this is a test in here, then this method simply returns boolean, and by default it returns true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will update it.
Test build #80715 has finished for PR 18956 at commit
|
Test build #80717 has finished for PR 18956 at commit
|
Test build #80718 has finished for PR 18956 at commit
|
Interesting, existing |
The reason The query causing the problem in
The optimized plan looks like:
Before
Currently the After the rule, the subquery looks like:
Notice
Because The unresolved By modifying |
The PR going to fix the issue described in #18956 (comment) is submitted at #18968. |
retest this please. |
#18968 is merged. This should pass the tests now. |
Test build #81079 has finished for PR 18956 at commit
|
Seems there are other issues caused by |
I submitted #19050 to fix it. |
#19050 is merged now. Let's see if there still is any rule can fail this structural integrity check. |
retest this please. |
Test build #81460 has finished for PR 18956 at commit
|
Test build #81496 has finished for PR 18956 at commit
|
retest this please. |
Test build #81504 has finished for PR 18956 at commit
|
Test build #81518 has finished for PR 18956 at commit
|
if (!planChecker(result)) { | ||
val message = s"After applying rule ${rule.ruleName} in batch ${batch.name}, " + | ||
"the structural integrity of the plan is broken." | ||
throw new TreeNodeException(result, message, null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the exception throwing logics into the planChecker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm. The message also has rule and batch names.
import org.apache.spark.sql.internal.SQLConf | ||
|
||
|
||
class OptimizerSICheckerkSuite extends PlanTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> OptimizerStructuralIntegrityCheckerkSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
* `Optimizer`, so we can catch rules that return invalid plans. The check function will returns | ||
* `false` if the given plan doesn't pass the structural integrity check. | ||
*/ | ||
protected def planChecker(plan: TreeType): Boolean = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
planChecker
-> isPlanIntegral
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { | |||
protected def batches: Seq[Batch] | |||
|
|||
/** | |||
* Defines a check function which checks for structural integrity of the plan after the execution | |||
* of each rule. For example, we can check whether a plan is still resolved after each rule in | |||
* `Optimizer`, so we can catch rules that return invalid plans. The check function will returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will returns
-> returns
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { | |||
protected def batches: Seq[Batch] | |||
|
|||
/** | |||
* Defines a check function which checks for structural integrity of the plan after the execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which
-> that
LGTM except two minor comments |
Test build #81529 has finished for PR 18956 at commit
|
Test build #81531 has finished for PR 18956 at commit
|
Thanks! Merged to master. |
Thanks @rxin @gatorsmile |
What changes were proposed in this pull request?
We have many optimization rules now in
Optimzer
. Right now we don't have any checks in the optimizer to check for the structural integrity of the plan (e.g. resolved). When debugging, it is difficult to identify which rules return invalid plans.It would be great if in test mode, we can check whether a plan is still resolved after the execution of each rule, so we can catch rules that return invalid plans.
How was this patch tested?
Added tests.