[SPARK-21717][SQL] Decouple consume functions of physical operators in whole-stage codegen #18931

viirya · 2017-08-13T06:12:36Z

What changes were proposed in this pull request?

It has been observed in SPARK-21603 that whole-stage codegen suffers performance degradation, if the generated functions are too long to be optimized by JIT.

We basically produce a single function to incorporate generated codes from all physical operators in whole-stage. Thus, it is possibly to grow the size of generated function over a threshold that we can't have JIT optimization for it anymore.

This patch is trying to decouple the logic of consuming rows in physical operators to avoid a giant function processing rows.

How was this patch tested?

Added tests.

SparkQA · 2017-08-13T07:04:50Z

Test build #80581 has finished for PR 18931 at commit 05274e7.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2017-08-13T09:02:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

@@ -149,14 +149,65 @@ trait CodegenSupport extends SparkPlan {

    ctx.freshNamePrefix = parent.variablePrefix
    val evaluated = evaluateRequiredVariables(output, inputVars, parent.usedInputs)
+
+    // Under certain conditions, we can put the logic to consume the rows of this operator into


Could you elaborate certain conditions in the comment if you have time?

Added more comment to elaborate the idea.

SparkQA · 2017-08-13T09:16:19Z

Test build #80584 has finished for PR 18931 at commit e0e7a6e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-13T11:31:03Z

Test build #80589 has finished for PR 18931 at commit 413707d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-13T11:38:48Z

Test build #80590 has finished for PR 18931 at commit 0bb8c0e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-08-13T14:26:01Z

Ran with the same benchmark in #18810.

Before this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
max function length of wholestagecodegen: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                    572 /  733          1.1         873.5       1.0X
codegen = T                                   2022 / 2039          0.3        3086.0       0.3X

After this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
max function length of wholestagecodegen: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                    548 /  740          1.2         836.6       1.0X
codegen = T                                    372 /  433          1.8         567.5       1.5X

This patch can significantly improve the performance degradation when the generated function is too long to be optimized by JIT.

viirya · 2017-08-13T14:35:28Z

Ran with aggregate with randomized keys benchmark in AggregateBenchmark to know if it will bring performance regression for the cases of no long generated function.

Before this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                   8817 / 8853          9.5         105.1       1.0X
codegen = T hashmap = F                       4904 / 4999         17.1          58.5       1.8X
codegen = T hashmap = T                       2040 / 2256         41.1          24.3       4.3X

After this patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F                                   8347 / 8388         10.1          99.5       1.0X
codegen = T hashmap = F                       5112 / 5146         16.4          60.9       1.6X
codegen = T hashmap = T                       2008 / 2365         41.8          23.9       4.2X

SparkQA · 2017-08-13T15:01:59Z

Test build #80594 has finished for PR 18931 at commit 6d600d5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-13T15:04:50Z

Test build #80595 has finished for PR 18931 at commit 502139a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-14T02:47:28Z

Test build #80605 has started for PR 18931 at commit 5fe3762.

SparkQA · 2017-08-14T04:41:55Z

Test build #80607 has finished for PR 18931 at commit 4bef567.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-14T07:04:50Z

Test build #80612 has finished for PR 18931 at commit 1694c9b.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-08-14T07:07:35Z

retest this please.

SparkQA · 2017-08-14T09:03:11Z

Test build #80616 has finished for PR 18931 at commit 1694c9b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-15T01:28:36Z

Test build #80648 has finished for PR 18931 at commit 8f3b984.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-15T06:22:48Z

Test build #80654 has finished for PR 18931 at commit c04da15.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-01-24T10:48:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    // 1. The config "SQLConf.DECOUPLE_OPERATOR_CONSUME_FUNCTIONS" is enabled.
+    // 2. The parent uses all variables in output. we can't defer variable evaluation when consume
+    //    in another function.
+    // 3. The output variables are not empty. If it's empty, we don't bother to do that.


why this? Logically an operator can still have complex consume method even if it doesn't have output.

Sounds correct to me, logically, although I have no clear idea about the actual operator can be.

cloud-fan · 2018-01-24T10:51:42Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    // 2. The parent uses all variables in output. we can't defer variable evaluation when consume
+    //    in another function.
+    // 3. The output variables are not empty. If it's empty, we don't bother to do that.
+    // 4. We don't use row variable. The construction of row uses deferred variable evaluation. We


I think what we need is inputVars are all materialized, which can be guaranteed by requireAllOutput and outputVars != null

Seems to me outputVars != null isn't necessary too. When it is null, row can't be null. inputVars will bind on row columns and be evaluated before calling created method.

cloud-fan · 2018-01-24T10:55:34Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    val consumeFunc =
+      if (SQLConf.get.decoupleOperatorConsumeFuncs && row == null && outputVars.nonEmpty &&
+          requireAllOutput && ctx.isValidParamLength(output)) {
+        constructDoConsumeFunction(ctx, inputVars)


we should pass row to this function, if it's non-null, we can save a projection.

maybe we should create a method for generating rowVar, so that we can use it in both consume and constructDoConsumeFunction

Good point.

cloud-fan · 2018-01-24T11:07:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    val ev = GenerateUnsafeProjection.createCode(ctx, colExprs, false)
+    val rowVar = ExprCode(ev.code.trim, "false", ev.value)
+
+    val doConsume = ctx.freshName("doConsume")


shall we put the operator name in this function name?

The freshName here will add variablePrefix before doConsume. So it already has operator name, e.g., agg_doConsume.

cloud-fan · 2018-01-24T11:17:05Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+  private def constructDoConsumeFunction(
+      ctx: CodegenContext,
+      inputVars: Seq[ExprCode]): String = {
+    val (callingParams, arguList, inputVarsInFunc) =


I feel it's cleaner to return paramNames, paramTypes, paramVars, then we can simply do

void $doConsume(paramTypes.zip(paramNames).map(i => i._1 + " " + i._2).mkString(", "))

and

doConsumeFuncName(paramNames.mkString(", "))

inside constructConsumeParameters we can just create 3 mutable collections and go through variables to fill these collections.

Sounds cleaner. I need to change it a little because the arguments and parameters are not the same. Some variables are not able parameterized, e.g., constants or statements.

cloud-fan · 2018-01-24T11:18:47Z

LGTM except a few comments

SparkQA · 2018-01-24T12:09:08Z

Test build #86580 has finished for PR 18931 at commit 6384aec.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-01-24T17:33:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

@@ -1263,6 +1271,8 @@ class SQLConf extends Serializable with Logging {

  def hugeMethodLimit: Int = getConf(WHOLESTAGE_HUGE_METHOD_LIMIT)

+  def decoupleOperatorConsumeFuncs: Boolean = getConf(DECOUPLE_OPERATOR_CONSUME_FUNCTIONS)


Add the wholeStage prefix for such flag names.

Sure. Done.

gatorsmile · 2018-01-24T17:54:02Z

Before merging this PR, we need test cases.

Add test cases to ensure our future changes will not break this.
Add test cases to ensure the newly added conf works as expected
Add test cases for boundary cases. For example, the limit of method parameters.

viirya · 2018-01-25T00:53:09Z

@gatorsmile Thanks. I will address above code comments first in next commit and add some test cases in later commit.

viirya · 2018-01-25T03:38:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+        "physical operator into individual methods, instead of a single big method. This can be " +
+        "used to avoid oversized function that can miss the opportunity of JIT optimization.")
+      .booleanConf
+      .createWithDefault(true)


Set to true by default. If there is objection, I can change it to false.

cloud-fan · 2018-01-25T04:14:36Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    val paramVars = mutable.ArrayBuffer[ExprCode]()
+
+    if (row != null) {
+      arguments += row


we need to update ctx.isValidParamLength to count this

we should probably have 2 methods for calculating param length and checking param length limitation.

Added an extra unit for row if needed.

cloud-fan · 2018-01-25T04:20:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    //    declaration.
+    val requireAllOutput = output.forall(parent.usedInputs.contains(_))
+    val consumeFunc =
+      if (SQLConf.get.wholeStageSplitConsumeFuncByOperator && requireAllOutput &&


super nit:

val confEnabled = SQLConf.get.wholeStageSplitConsumeFuncByOperator if (confEnabled && ...)

SparkQA · 2018-01-25T04:55:58Z

Test build #86615 has finished for PR 18931 at commit 0c4173e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-25T07:00:43Z

Test build #86621 has finished for PR 18931 at commit 2fdf6e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-25T07:04:50Z

Test build #86622 has finished for PR 18931 at commit c859d53.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-01-25T07:41:02Z

LGTM, pending jenkins

SparkQA · 2018-01-25T08:05:01Z

Test build #86624 has finished for PR 18931 at commit 11946e7.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-01-25T08:09:11Z

retest this please.

kiszk · 2018-01-25T10:08:54Z

LGTM

SparkQA · 2018-01-25T11:42:10Z

Test build #86633 has finished for PR 18931 at commit 11946e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-01-25T11:51:52Z

thanks, merging to master/2.3!

…n whole-stage codegen ## What changes were proposed in this pull request? It has been observed in SPARK-21603 that whole-stage codegen suffers performance degradation, if the generated functions are too long to be optimized by JIT. We basically produce a single function to incorporate generated codes from all physical operators in whole-stage. Thus, it is possibly to grow the size of generated function over a threshold that we can't have JIT optimization for it anymore. This patch is trying to decouple the logic of consuming rows in physical operators to avoid a giant function processing rows. ## How was this patch tested? Added tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #18931 from viirya/SPARK-21717. (cherry picked from commit d20bbc2) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Decouple consume functions of physical operators in whole-stage codegen.

05274e7

shouldStop is called outside consume().

e0e7a6e

kiszk reviewed Aug 13, 2017

View reviewed changes

viirya added 2 commits August 13, 2017 10:52

Fix the condition and the case of using continue in consume.

413707d

More comment.

0bb8c0e

Fix aggregation.

6d600d5

viirya closed this Aug 13, 2017

viirya reopened this Aug 13, 2017

Also deal with sort case.

502139a

Fix broadcasthash join.

5fe3762

viirya force-pushed the SPARK-21717 branch from da13ea4 to 5fe3762 Compare August 14, 2017 02:46

Add more comments.

4bef567

Fix the cases where operators set up its produce framework.

1694c9b

Fix Expand.

8f3b984

Fix BroadcastHashJoin.

c04da15

cloud-fan reviewed Jan 24, 2018

View reviewed changes

gatorsmile reviewed Jan 24, 2018

View reviewed changes

Address comments.

0c4173e

viirya commented Jan 25, 2018

View reviewed changes

Add tests.

c859d53

viirya force-pushed the SPARK-21717 branch from 2fdf6e7 to c859d53 Compare January 25, 2018 03:45

cloud-fan reviewed Jan 25, 2018

View reviewed changes

Refactor isValidParamLength a bit.

11946e7

asfgit closed this in d20bbc2 Jan 25, 2018

viirya deleted the SPARK-21717 branch December 27, 2023 18:21

		@@ -1263,6 +1271,8 @@ class SQLConf extends Serializable with Logging {

		def hugeMethodLimit: Int = getConf(WHOLESTAGE_HUGE_METHOD_LIMIT)

		def decoupleOperatorConsumeFuncs: Boolean = getConf(DECOUPLE_OPERATOR_CONSUME_FUNCTIONS)

[SPARK-21717][SQL] Decouple consume functions of physical operators in whole-stage codegen #18931

[SPARK-21717][SQL] Decouple consume functions of physical operators in whole-stage codegen #18931

Conversation

viirya commented Aug 13, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Aug 13, 2017

kiszk Aug 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 13, 2017

SparkQA commented Aug 13, 2017

SparkQA commented Aug 13, 2017

viirya commented Aug 13, 2017 • edited Loading

viirya commented Aug 13, 2017

SparkQA commented Aug 13, 2017

SparkQA commented Aug 13, 2017

SparkQA commented Aug 14, 2017

SparkQA commented Aug 14, 2017

SparkQA commented Aug 14, 2017

viirya commented Aug 14, 2017

SparkQA commented Aug 14, 2017

SparkQA commented Aug 15, 2017

SparkQA commented Aug 15, 2017

cloud-fan Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Jan 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Jan 24, 2018

SparkQA commented Jan 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Jan 24, 2018 • edited Loading

viirya commented Jan 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 25, 2018

SparkQA commented Jan 25, 2018

SparkQA commented Jan 25, 2018

cloud-fan commented Jan 25, 2018

SparkQA commented Jan 25, 2018

viirya commented Jan 25, 2018

kiszk commented Jan 25, 2018

SparkQA commented Jan 25, 2018

cloud-fan commented Jan 25, 2018

viirya commented Aug 13, 2017 •

edited

Loading

kiszk Aug 13, 2017 •

edited

Loading

viirya commented Aug 13, 2017 •

edited

Loading

cloud-fan Jan 24, 2018 •

edited

Loading

viirya Jan 25, 2018 •

edited

Loading

gatorsmile commented Jan 24, 2018 •

edited

Loading