Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark-21807][SQL]Override ++ operation in ExpressionSet to reduce clone time #19022

Closed
wants to merge 10 commits into from

Conversation

eatoncys
Copy link
Contributor

What changes were proposed in this pull request?

The getAliasedConstraints fuction in LogicalPlan.scala will clone the expression set when an element added,
and it will take a long time. This PR add a function to add multiple elements at once to reduce the clone time.

Before modified, the cost of getAliasedConstraints is:
100 expressions: 41 seconds
150 expressions: 466 seconds

After modified, the cost of getAliasedConstraints is:
100 expressions: 1.8 seconds
150 expressions: 6.5 seconds

The test is like this:
test("getAliasedConstraints") {
val expressionNum = 150
val aggExpression = (1 to expressionNum).map(i => Alias(Count(Literal(1)), s"cnt$i")())
val aggPlan = Aggregate(Nil, aggExpression, LocalRelation())

val beginTime = System.currentTimeMillis()
val expressions = aggPlan.validConstraints
println(s"validConstraints cost: ${System.currentTimeMillis() - beginTime}ms")
// The size of Aliased expression is n * (n - 1) / 2 + n
assert( expressions.size === expressionNum * (expressionNum - 1) / 2 + expressionNum)

}

(Please fill in changes proposed in this fix)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Run new added test.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@@ -59,6 +59,12 @@ class ExpressionSet protected(
}
}

def addMultiExpressions(elems: Set[Expression]): ExpressionSet = {
Copy link
Member

@viirya viirya Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenTraversableOnce instead of Set?

@@ -59,6 +59,12 @@ class ExpressionSet protected(
}
}

def addMultiExpressions(elems: Set[Expression]): ExpressionSet = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Override ++ instead of addMultiExpressions? As it is a Set, we better follow Set's semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Modified it, thanks.

projectList.foreach {
case a @ Alias(e, _) =>
// For every alias in `projectList`, replace the reference in constraints by its attribute.
allConstraints ++= allConstraints.map(_ transform {
val replacedElement = allConstraints.map(_ transform {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By overriding ++, we don't need to change those.

@@ -59,6 +59,12 @@ class ExpressionSet protected(
}
}

def addMultiExpressions(elems: Set[Expression]): ExpressionSet = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a related test in ExpressionSetSuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Added, thanks.


val expressions = aggPlan.validConstraints
// The size of Aliased expression is n * (n - 1) / 2 + n
assert( expressions.size === expressionNum * (expressionNum - 1) / 2 + expressionNum)
Copy link
Member

@viirya viirya Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on the size of validConstraints may be fragile. I think we only need to test if the new API of ExpressionSet works. There should be other constraints test to make sure we produce correct constraints.

@@ -210,4 +210,13 @@ class ExpressionSetSuite extends SparkFunSuite {
assert((initialSet - (aLower + 1)).size == 0)

}

test("add multiple elements to set") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case doesn't fail without the above code, does it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yap, previous ++ works without problem. The override version goes to reduce the time cost. This test is just used to make sure we don't mess up behavior of ++.

@@ -17,7 +17,7 @@

package org.apache.spark.sql.catalyst.expressions

import scala.collection.mutable
import scala.collection.{GenTraversableOnce, mutable}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do style check by running dev/scalastyle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eatoncys This breaks scala style check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, modified it, thanks.

@viirya
Copy link
Member

viirya commented Aug 23, 2017

We can edit the PR title and description to show the changes clearly.

elems.foreach(newSet.add)
newSet
}

override def -(elem: Expression): ExpressionSet = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't use -- frequently or in hot code path. We may add it when we need it.

@viirya
Copy link
Member

viirya commented Aug 23, 2017

cc @gatorsmile @HyukjinKwon May you help to trigger Jenkins tests? Thanks.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Aug 23, 2017

Test build #81035 has finished for PR 19022 at commit 57e8a3d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

add to whitelist

@SparkQA
Copy link

SparkQA commented Aug 23, 2017

Test build #81039 has finished for PR 19022 at commit 57e8a3d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("add multiple elements to set") {
val initialSet = ExpressionSet(aUpper + 1 :: Nil)
val setToAddWithSameExpression = ExpressionSet(aUpper + 1 :: aUpper + 2 :: Nil)
val setToAdd = ExpressionSet(aUpper + 2 :: aUpper + 3 :: Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very sure what you want to test in the second one. Are you want to test the behavior of adding a set of expressions that do not exist in the initial set?
-> ExpressionSet(aUpper + 3 :: aUpper + 4 :: Nil)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have modified the name to setToAddWithOutSameExpression.

@eatoncys eatoncys changed the title [Spark-21807][SQL]The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100 [Spark-21807][SQL]Override ++ operation in ExpressionSet to reduce clone time Aug 24, 2017
@SparkQA
Copy link

SparkQA commented Aug 24, 2017

Test build #81057 has finished for PR 19022 at commit 0762840.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("add multiple elements to set") {
val initialSet = ExpressionSet(aUpper + 1 :: Nil)
val setToAddWithSameExpression = ExpressionSet(aUpper + 1 :: aUpper + 2 :: Nil)
val setToAddWithOutSameExpression = ExpressionSet(aUpper + 3 :: aUpper + 4 :: Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: WithOut -> Without

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

You can fix this in your future PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants