-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark-21807][SQL]Override ++ operation in ExpressionSet to reduce clone time #19022
Conversation
@@ -59,6 +59,12 @@ class ExpressionSet protected( | |||
} | |||
} | |||
|
|||
def addMultiExpressions(elems: Set[Expression]): ExpressionSet = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenTraversableOnce
instead of Set
?
@@ -59,6 +59,12 @@ class ExpressionSet protected( | |||
} | |||
} | |||
|
|||
def addMultiExpressions(elems: Set[Expression]): ExpressionSet = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Override ++
instead of addMultiExpressions
? As it is a Set
, we better follow Set's semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Modified it, thanks.
projectList.foreach { | ||
case a @ Alias(e, _) => | ||
// For every alias in `projectList`, replace the reference in constraints by its attribute. | ||
allConstraints ++= allConstraints.map(_ transform { | ||
val replacedElement = allConstraints.map(_ transform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By overriding ++
, we don't need to change those.
@@ -59,6 +59,12 @@ class ExpressionSet protected( | |||
} | |||
} | |||
|
|||
def addMultiExpressions(elems: Set[Expression]): ExpressionSet = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a related test in ExpressionSetSuite
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Added, thanks.
|
||
val expressions = aggPlan.validConstraints | ||
// The size of Aliased expression is n * (n - 1) / 2 + n | ||
assert( expressions.size === expressionNum * (expressionNum - 1) / 2 + expressionNum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on the size of validConstraints
may be fragile. I think we only need to test if the new API of ExpressionSet
works. There should be other constraints test to make sure we produce correct constraints.
@@ -210,4 +210,13 @@ class ExpressionSetSuite extends SparkFunSuite { | |||
assert((initialSet - (aLower + 1)).size == 0) | |||
|
|||
} | |||
|
|||
test("add multiple elements to set") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case doesn't fail without the above code, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yap, previous ++
works without problem. The override version goes to reduce the time cost. This test is just used to make sure we don't mess up behavior of ++
.
@@ -17,7 +17,7 @@ | |||
|
|||
package org.apache.spark.sql.catalyst.expressions | |||
|
|||
import scala.collection.mutable | |||
import scala.collection.{GenTraversableOnce, mutable} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do style check by running dev/scalastyle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eatoncys This breaks scala style check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, modified it, thanks.
We can edit the PR title and description to show the changes clearly. |
elems.foreach(newSet.add) | ||
newSet | ||
} | ||
|
||
override def -(elem: Expression): ExpressionSet = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't use --
frequently or in hot code path. We may add it when we need it.
cc @gatorsmile @HyukjinKwon May you help to trigger Jenkins tests? Thanks. |
ok to test |
Test build #81035 has finished for PR 19022 at commit
|
add to whitelist |
Test build #81039 has finished for PR 19022 at commit
|
test("add multiple elements to set") { | ||
val initialSet = ExpressionSet(aUpper + 1 :: Nil) | ||
val setToAddWithSameExpression = ExpressionSet(aUpper + 1 :: aUpper + 2 :: Nil) | ||
val setToAdd = ExpressionSet(aUpper + 2 :: aUpper + 3 :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very sure what you want to test in the second one. Are you want to test the behavior of adding a set of expressions that do not exist in the initial set?
-> ExpressionSet(aUpper + 3 :: aUpper + 4 :: Nil)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have modified the name to setToAddWithOutSameExpression.
Test build #81057 has finished for PR 19022 at commit
|
test("add multiple elements to set") { | ||
val initialSet = ExpressionSet(aUpper + 1 :: Nil) | ||
val setToAddWithSameExpression = ExpressionSet(aUpper + 1 :: aUpper + 2 :: Nil) | ||
val setToAddWithOutSameExpression = ExpressionSet(aUpper + 3 :: aUpper + 4 :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: WithOut
-> Without
LGTM |
Thanks! Merging to master. You can fix this in your future PRs. |
What changes were proposed in this pull request?
The getAliasedConstraints fuction in LogicalPlan.scala will clone the expression set when an element added,
and it will take a long time. This PR add a function to add multiple elements at once to reduce the clone time.
Before modified, the cost of getAliasedConstraints is:
100 expressions: 41 seconds
150 expressions: 466 seconds
After modified, the cost of getAliasedConstraints is:
100 expressions: 1.8 seconds
150 expressions: 6.5 seconds
The test is like this:
test("getAliasedConstraints") {
val expressionNum = 150
val aggExpression = (1 to expressionNum).map(i => Alias(Count(Literal(1)), s"cnt$i")())
val aggPlan = Aggregate(Nil, aggExpression, LocalRelation())
}
(Please fill in changes proposed in this fix)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Run new added test.
Please review http://spark.apache.org/contributing.html before opening a pull request.