Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21717][SQL] Decouple consume functions of physical operators in whole-stage codegen #18931

Closed
wants to merge 27 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Aug 13, 2017

What changes were proposed in this pull request?

It has been observed in SPARK-21603 that whole-stage codegen suffers performance degradation, if the generated functions are too long to be optimized by JIT.

We basically produce a single function to incorporate generated codes from all physical operators in whole-stage. Thus, it is possibly to grow the size of generated function over a threshold that we can't have JIT optimization for it anymore.

This patch is trying to decouple the logic of consuming rows in physical operators to avoid a giant function processing rows.

How was this patch tested?

Added tests.

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80581 has finished for PR 18931 at commit 05274e7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -149,14 +149,65 @@ trait CodegenSupport extends SparkPlan {

ctx.freshNamePrefix = parent.variablePrefix
val evaluated = evaluateRequiredVariables(output, inputVars, parent.usedInputs)

// Under certain conditions, we can put the logic to consume the rows of this operator into
Copy link
Member

@kiszk kiszk Aug 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate certain conditions in the comment if you have time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comment to elaborate the idea.

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80584 has finished for PR 18931 at commit e0e7a6e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80589 has finished for PR 18931 at commit 413707d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80590 has finished for PR 18931 at commit 0bb8c0e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya closed this Aug 13, 2017
@viirya viirya reopened this Aug 13, 2017
@viirya
Copy link
Member Author

viirya commented Aug 13, 2017

Ran with the same benchmark in #18810.

Before this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
max function length of wholestagecodegen: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                    572 /  733          1.1         873.5       1.0X
codegen = T                                   2022 / 2039          0.3        3086.0       0.3X

After this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
max function length of wholestagecodegen: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                    548 /  740          1.2         836.6       1.0X
codegen = T                                    372 /  433          1.8         567.5       1.5X

This patch can significantly improve the performance degradation when the generated function is too long to be optimized by JIT.

@viirya
Copy link
Member Author

viirya commented Aug 13, 2017

Ran with aggregate with randomized keys benchmark in AggregateBenchmark to know if it will bring performance regression for the cases of no long generated function.

Before this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                   8817 / 8853          9.5         105.1       1.0X
codegen = T hashmap = F                       4904 / 4999         17.1          58.5       1.8X
codegen = T hashmap = T                       2040 / 2256         41.1          24.3       4.3X

After this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                   8347 / 8388         10.1          99.5       1.0X
codegen = T hashmap = F                       5112 / 5146         16.4          60.9       1.6X
codegen = T hashmap = T                       2008 / 2365         41.8          23.9       4.2X

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80594 has finished for PR 18931 at commit 6d600d5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80595 has finished for PR 18931 at commit 502139a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80605 has started for PR 18931 at commit 5fe3762.

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80607 has finished for PR 18931 at commit 4bef567.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80612 has finished for PR 18931 at commit 1694c9b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 14, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80616 has finished for PR 18931 at commit 1694c9b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 15, 2017

Test build #80648 has finished for PR 18931 at commit 8f3b984.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 15, 2017

Test build #80654 has finished for PR 18931 at commit c04da15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// 1. The config "SQLConf.DECOUPLE_OPERATOR_CONSUME_FUNCTIONS" is enabled.
// 2. The parent uses all variables in output. we can't defer variable evaluation when consume
// in another function.
// 3. The output variables are not empty. If it's empty, we don't bother to do that.
Copy link
Contributor

@cloud-fan cloud-fan Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this? Logically an operator can still have complex consume method even if it doesn't have output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds correct to me, logically, although I have no clear idea about the actual operator can be.

// 2. The parent uses all variables in output. we can't defer variable evaluation when consume
// in another function.
// 3. The output variables are not empty. If it's empty, we don't bother to do that.
// 4. We don't use row variable. The construction of row uses deferred variable evaluation. We
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we need is inputVars are all materialized, which can be guaranteed by requireAllOutput and outputVars != null

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me outputVars != null isn't necessary too. When it is null, row can't be null. inputVars will bind on row columns and be evaluated before calling created method.

val consumeFunc =
if (SQLConf.get.decoupleOperatorConsumeFuncs && row == null && outputVars.nonEmpty &&
requireAllOutput && ctx.isValidParamLength(output)) {
constructDoConsumeFunction(ctx, inputVars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pass row to this function, if it's non-null, we can save a projection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should create a method for generating rowVar, so that we can use it in both consume and constructDoConsumeFunction

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

val ev = GenerateUnsafeProjection.createCode(ctx, colExprs, false)
val rowVar = ExprCode(ev.code.trim, "false", ev.value)

val doConsume = ctx.freshName("doConsume")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put the operator name in this function name?

Copy link
Member Author

@viirya viirya Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The freshName here will add variablePrefix before doConsume. So it already has operator name, e.g., agg_doConsume.

private def constructDoConsumeFunction(
ctx: CodegenContext,
inputVars: Seq[ExprCode]): String = {
val (callingParams, arguList, inputVarsInFunc) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's cleaner to return paramNames, paramTypes, paramVars, then we can simply do

void $doConsume(paramTypes.zip(paramNames).map(i => i._1 + " " + i._2).mkString(", "))

and

doConsumeFuncName(paramNames.mkString(", "))

inside constructConsumeParameters we can just create 3 mutable collections and go through variables to fill these collections.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds cleaner. I need to change it a little because the arguments and parameters are not the same. Some variables are not able parameterized, e.g., constants or statements.

@cloud-fan
Copy link
Contributor

LGTM except a few comments

@SparkQA
Copy link

SparkQA commented Jan 24, 2018

Test build #86580 has finished for PR 18931 at commit 6384aec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1263,6 +1271,8 @@ class SQLConf extends Serializable with Logging {

def hugeMethodLimit: Int = getConf(WHOLESTAGE_HUGE_METHOD_LIMIT)

def decoupleOperatorConsumeFuncs: Boolean = getConf(DECOUPLE_OPERATOR_CONSUME_FUNCTIONS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the wholeStage prefix for such flag names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done.

@gatorsmile
Copy link
Member

gatorsmile commented Jan 24, 2018

Before merging this PR, we need test cases.

  • Add test cases to ensure our future changes will not break this.
  • Add test cases to ensure the newly added conf works as expected
  • Add test cases for boundary cases. For example, the limit of method parameters.

@viirya
Copy link
Member Author

viirya commented Jan 25, 2018

@gatorsmile Thanks. I will address above code comments first in next commit and add some test cases in later commit.

"physical operator into individual methods, instead of a single big method. This can be " +
"used to avoid oversized function that can miss the opportunity of JIT optimization.")
.booleanConf
.createWithDefault(true)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set to true by default. If there is objection, I can change it to false.

val paramVars = mutable.ArrayBuffer[ExprCode]()

if (row != null) {
arguments += row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update ctx.isValidParamLength to count this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably have 2 methods for calculating param length and checking param length limitation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an extra unit for row if needed.

// declaration.
val requireAllOutput = output.forall(parent.usedInputs.contains(_))
val consumeFunc =
if (SQLConf.get.wholeStageSplitConsumeFuncByOperator && requireAllOutput &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit:

val confEnabled = SQLConf.get.wholeStageSplitConsumeFuncByOperator
if (confEnabled && ...)

@SparkQA
Copy link

SparkQA commented Jan 25, 2018

Test build #86615 has finished for PR 18931 at commit 0c4173e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 25, 2018

Test build #86621 has finished for PR 18931 at commit 2fdf6e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 25, 2018

Test build #86622 has finished for PR 18931 at commit c859d53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, pending jenkins

@SparkQA
Copy link

SparkQA commented Jan 25, 2018

Test build #86624 has finished for PR 18931 at commit 11946e7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 25, 2018

retest this please.

@kiszk
Copy link
Member

kiszk commented Jan 25, 2018

LGTM

@SparkQA
Copy link

SparkQA commented Jan 25, 2018

Test build #86633 has finished for PR 18931 at commit 11946e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.3!

asfgit pushed a commit that referenced this pull request Jan 25, 2018
…n whole-stage codegen

## What changes were proposed in this pull request?

It has been observed in SPARK-21603 that whole-stage codegen suffers performance degradation, if the generated functions are too long to be optimized by JIT.

We basically produce a single function to incorporate generated codes from all physical operators in whole-stage. Thus, it is possibly to grow the size of generated function over a threshold that we can't have JIT optimization for it anymore.

This patch is trying to decouple the logic of consuming rows in physical operators to avoid a giant function processing rows.

## How was this patch tested?

Added tests.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #18931 from viirya/SPARK-21717.

(cherry picked from commit d20bbc2)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in d20bbc2 Jan 25, 2018
@viirya viirya deleted the SPARK-21717 branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants