[SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit #17087

kiszk · 2017-02-27T19:39:21Z

What changes were proposed in this pull request?

When an expression for df.filter() has many nodes (e.g. 400), the size of Java bytecode for the generated Java code is more than 64KB. It produces an Java exception. As a result, the execution fails.
This PR continues to execute by calling Expression.eval() disabling code generation if an exception has been caught.

How was this patch tested?

Add a test suite into DataFrameSuite

SparkQA · 2017-02-27T21:38:53Z

Test build #73530 has finished for PR 17087 at commit 6f40a93.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2017-03-06T06:49:40Z

@davies, could you please review this?

srowen · 2017-03-06T08:50:02Z

Is it right to just do this in one code path? why not all similar cases where the 64k limit is exceeded?
Can the underlying problem be fixed or improved?
You're also catching all Exceptions, why not something more specific? is it possible?
It feels like a funny change to make for these reasons.

kiszk · 2017-03-06T15:11:48Z

I already identified where 64k limit is exceeded. As a result, I think that it is not easy to fix the issue.
The previous issues are derived from independent blocks (in Seq[ExprCode]) that can be splited into multiple methods. On the other hand, it is not easy split a tree with many nodes for Expression (in ExprCode) that cannot be into multiple methods. It is very hard to split a ExprCode into multiple correct ExprCodes.
I think that we have to redesign how to keep generated Java code for a tree for Expression .

For catching exception, you are right. I have to catch only an exception related to compilation errors. I will address it by handling nested exceptions. I will address it.

SparkQA · 2017-03-06T17:55:25Z

Test build #74019 has finished for PR 17087 at commit 98cd961.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2017-03-07T21:56:46Z

I agree with the general approach of having a fallback from code generation to interpreted evaluation, but I also agree that this feels too narrowly targeted. In particular, why do this in one operator rather than in newPredicate (or maybe even in codegen itself).

Another thing that maybe @davies can comment on, I thought we already had this fallback implemented? So I'm curious why its not already handling this test case. Maybe there is an existing mechanism we just need to make more general.

marmbrus · 2017-03-07T21:58:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

+        }
+      } catch {
+        // JaninoRuntimeException is in a nested exception if Java compilation error occurs
+        case e: Exception if ExceptionUtils.getRootCause(e).isInstanceOf[JaninoRuntimeException] =>


Rather than walk the exception tree, should we just make our wrapping more specific?

Good catch. I will make this wrapping more specific.

kiszk · 2017-03-08T17:17:29Z

@marmbrus thank you for your comments.

For feedback mechanism, I imagine that you are talking about this. When the whole-stage codegen is enabled, this fallback works.
In this case, the whole-stage codegen is disabled since the number of fields are big and these two variables will have false.

I agree that we should prepare more generic approach that can be applicable to more cases. However, it is not easy to implement for now. It would require some refactoring.
As one of simple refactoring, if an exception occurs, newPredicate returns null or none. Then, caller of newPredicate always checks return value for handling of code generation failure. It make caller code simpler.
While more complex refactoring could be applicable, another PR should address it.

What do you think?

FYI: I have just noticed that this problem may occur at CartesianProductExec and InMemoryTableScanExec. I have to do the same thing in these two places.

marmbrus · 2017-03-08T18:24:32Z

I don't think we need a complex refactoring. Why can't newPredicate catch the exception, log a warning and return an interpreted Predicate?

kiszk · 2017-03-09T09:40:57Z

I am refactoring newPredicate. newPredicate will catch the exception and log a warning. However, I think that newPredicate cannot return an interpreted result. This is because code generation return Predicate while BindReferences.bindReference(condition, child.output) returns Expression. These two classes exists in different class hierarchy.

kiszk · 2017-03-09T10:07:05Z

@marmbrus I have just commit the code of intermediate refactoring. Would it be possible to give comments?
If it is fine (return null and check it at caller), I will update other caller sites for newPredicate.

SparkQA · 2017-03-09T12:12:23Z

Test build #74262 has finished for PR 17087 at commit 0e2bbe7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2017-03-09T19:59:32Z

There appears to have been some code drift (as GeneratePredicate and InterpretedPredicate both used to return a class that inherited from a common interface), but I don't think its hard to just isolate the fault handling in newPredicate.

Just fix InterpretedPredicate to actually return a Predicate rather than a bare lambda function. The code there already handles binding and evaluation.

kiszk · 2017-03-10T08:38:30Z

Thank you for pointing out InterpretedPredicate. Now, newPredicate always returns Predicate that can be executed by calling eval().
It looks simpler and better than the first implementation.

SparkQA · 2017-03-10T08:48:33Z

Test build #74310 has finished for PR 17087 at commit 5fb413f.

This patch fails to build.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
class InterpretedPredicate(expression: Expression) extends GenPredicate

SparkQA · 2017-03-10T08:54:53Z

Test build #74311 has finished for PR 17087 at commit c02589c.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-10T11:18:57Z

Test build #74313 has finished for PR 17087 at commit a2f85cd.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class InterpretedPredicate(expression: Expression) extends GenPredicate

SparkQA · 2017-03-11T07:12:48Z

Test build #74374 has finished for PR 17087 at commit c5fc5f1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class InterpretedPredicate(expression: Expression) extends GenPredicate

viirya · 2017-03-13T04:40:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

@@ -372,7 +374,7 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co
    try {
      CodeGenerator.compile(cleanedSource)
    } catch {
-      case e: Exception if !Utils.isTesting && sqlContext.conf.wholeStageFallback =>
+      case e: JaninoRuntimeException if !Utils.isTesting && sqlContext.conf.wholeStageFallback =>


~~CodeGenerator.doCompile catches Exception and re-throw Exception.~~

Can other exceptions be thrown during compiling?

If other exceptions causing compiling failed happens, I think we still need fallback to non wholestage execution?

Sure, reverted

viirya · 2017-03-13T05:03:32Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

@@ -951,10 +965,10 @@ object CodeGenerator extends Logging {
      evaluator.cook("generated.java", code.body)
      recordCompilationStats(evaluator)
    } catch {
-      case e: Exception =>
+      case e: JaninoRuntimeException =>


Looks like CompileException can be thrown from janino?

Thank you for pointing out . Now, CompileException can be caught.

viirya · 2017-03-13T05:08:43Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

@@ -355,7 +357,21 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ

  protected def newPredicate(
      expression: Expression, inputSchema: Seq[Attribute]): GenPredicate = {
-    GeneratePredicate.generate(expression, inputSchema)
+    try {
+      GeneratePredicate.generate(expression, inputSchema)


Shall we only do this fallback if sqlContext.conf.wholeStageFallback is turned on?

It is good to control it using an option. This is not a part of the whole-stage codegen.
Is it better to use sqlContext.conf.wholeStageFallback or add sqlContext.conf.codegenFallback?
What do you think?

I am wondering if it makes sense that wholeStageFallback is false and this new option is true, or vice verse.

Sure, now look at sqlContext.conf.wholeStageFallback.

viirya · 2017-03-13T05:09:34Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

@@ -32,6 +32,7 @@ import org.apache.spark.sql.types.LongType
 import org.apache.spark.util.ThreadUtils
 import org.apache.spark.util.random.{BernoulliCellSampler, PoissonSampler}

+


nit: Extra space line.

Good catch, done

viirya · 2017-03-13T05:11:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

+        }
+        logWarning(s"Codegen disabled for this expression:\n $logMessage")
+        InterpretedPredicate.create(expression, inputSchema)
+      case e: Exception =>


This case can be removed.

viirya · 2017-03-13T06:19:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

 }

+class InterpretedPredicate(expression: Expression) extends GenPredicate {


Is this change necessary?

Btw, let InterpretedPredicate extends GenPredicate looks weird logically.

First, this change is necessary to return Predicate for InterpretedPredicate.

For GenPredicate, I used the same naming convention in SparkPlan.scala. If we use Predicate, we will have a conflict with others. What name would it be better instead of GenPredicate?

BasePredicate?

Thanks, looks good. Done

SparkQA · 2017-03-13T06:47:30Z

Test build #74428 has started for PR 17087 at commit 11f56d1.

kiszk · 2017-03-13T07:12:19Z

Jenkins, retest this please

SparkQA · 2017-03-13T08:53:24Z

Test build #74435 has finished for PR 17087 at commit 11f56d1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-13T10:26:52Z

Test build #74442 has finished for PR 17087 at commit 530f84f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class InterpretedPredicate(expression: Expression) extends BasePredicate

SparkQA · 2017-04-20T20:06:01Z

Test build #75999 has finished for PR 17087 at commit 8b6ba75.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2017-04-21T08:29:50Z

@marmbrus could you please take a look?

kiszk · 2017-04-30T18:16:22Z

ping @marmbrus

zsxwing · 2017-05-10T17:42:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

 }

+class InterpretedPredicate(expression: Expression) extends BasePredicate {
+  def eval(r: InternalRow): Boolean = expression.eval(r).asInstanceOf[Boolean]


nit: override def eval...

Sure, done for both.

zsxwing · 2017-05-10T17:42:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

 }

+class InterpretedPredicate(expression: Expression) extends BasePredicate {


nit: case class

zsxwing · 2017-05-10T17:43:57Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e : UncheckedExecutionException =>


You can use the following simple codes:

case e@(_: UncheckedExecutionException | _: ExecutionError) => throw e.getCause

I see, done. thanks.

zsxwing · 2017-05-10T17:45:38Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

+    try {
+      GeneratePredicate.generate(expression, inputSchema)
+    } catch {
+      case e: JaninoRuntimeException if sqlContext == null || sqlContext.conf.wholeStageFallback =>


nit: It's better to add a debug log to print e. you can merge two cases like this:

case e@(_: JaninoRuntimeException | _: CompileException) if sqlContext == null || sqlContext.conf.wholeStageFallback => logDebug(e.getMessage, e) genInterpretedPredicate(expression, inputSchema)

Yes, done. Thank you.

Forgot this?

@kiszk could you fix this as well?

@zsxwing Sorry, I made mistake. Now, I pushed it.

zsxwing · 2017-05-10T17:54:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

@@ -353,9 +356,28 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ
    GenerateMutableProjection.generate(expressions, inputSchema, useSubexprElimination)
  }

+  private def genInterpretedPredicate(
+      expression: Expression, inputSchema: Seq[Attribute]): InterpretedPredicate = {
+    val str = expression.toString


I think Expression.toString will truncate too big expression. Right?

Yeah, Expression.toString truncates something. However, it does not work for this case. Thus, I did not change here.

zsxwing · 2017-05-10T17:58:56Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+      case e: CompileException =>
+        val msg = s"failed to compile: $e\n$formatted"
+        logError(msg, e)
+        throw new CompileException(msg, e.asInstanceOf[CompileException].getLocation)


Please use throw new CompileException(msg, e.getLocation, e)

Good catch, done.

zsxwing

I made one pass. This looks good. Most of my comments are style issues.

SparkQA · 2017-05-12T11:14:59Z

Test build #76862 has finished for PR 17087 at commit 1f19c80.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class InterpretedPredicate(expression: Expression) extends BasePredicate

zsxwing · 2017-05-15T21:58:40Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      val excChains = ExceptionUtils.getThrowables(e)
+      val exc = if (excChains.length == 1) excChains(0) else excChains(excChains.length - 2)
+      throw exc


Why not use e.getCause?

Good catch, done

SparkQA · 2017-05-16T00:57:01Z

Test build #76949 has finished for PR 17087 at commit 3868bf5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-16T17:45:31Z

Test build #76972 has finished for PR 17087 at commit a5fd465.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2017-05-16T21:47:01Z

LGTM. Thanks! Merging to master.

…o 64KB bytecode size limit ## What changes were proposed in this pull request? When an expression for `df.filter()` has many nodes (e.g. 400), the size of Java bytecode for the generated Java code is more than 64KB. It produces an Java exception. As a result, the execution fails. This PR continues to execute by calling `Expression.eval()` disabling code generation if an exception has been caught. ## How was this patch tested? Add a test suite into `DataFrameSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#17087 from kiszk/SPARK-19372.

dongjoon-hyun · 2017-05-26T03:31:39Z

Hi, All.
I'm wondering if it is too late for Spark 2.2.0 to include this.
Is this too risky for that?

…o 64KB bytecode size limit ## What changes were proposed in this pull request? When an expression for `df.filter()` has many nodes (e.g. 400), the size of Java bytecode for the generated Java code is more than 64KB. It produces an Java exception. As a result, the execution fails. This PR continues to execute by calling `Expression.eval()` disabling code generation if an exception has been caught. ## How was this patch tested? Add a test suite into `DataFrameSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #17087 from kiszk/SPARK-19372.

…o 64KB bytecode size limit When an expression for `df.filter()` has many nodes (e.g. 400), the size of Java bytecode for the generated Java code is more than 64KB. It produces an Java exception. As a result, the execution fails. This PR continues to execute by calling `Expression.eval()` disabling code generation if an exception has been caught. Add a test suite into `DataFrameSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#17087 from kiszk/SPARK-19372.

poplav · 2017-08-15T00:11:38Z

Hi, All. I am trying to get this included into Spark 2.1.1. I opened a PR #18942.

…o 64KB bytecode size limit When an expression for `df.filter()` has many nodes (e.g. 400), the size of Java bytecode for the generated Java code is more than 64KB. It produces an Java exception. As a result, the execution fails. This PR continues to execute by calling `Expression.eval()` disabling code generation if an exception has been caught. Add a test suite into `DataFrameSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#17087 from kiszk/SPARK-19372.

gatorsmile · 2017-08-26T21:13:05Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

+      GeneratePredicate.generate(expression, inputSchema)
+    } catch {
+      case e @ (_: JaninoRuntimeException | _: CompileException)
+          if sqlContext == null || sqlContext.conf.wholeStageFallback =>


sqlContext.conf.wholeStageFallback is almost useless here, because almost all the cases this will be done in executors.

Pretty risky if we always fallback. This might hide bugs.

Let me fix it by another PR.

marmbrus reviewed Mar 7, 2017

View reviewed changes

kiszk force-pushed the SPARK-19372 branch from c02589c to a2f85cd Compare March 10, 2017 09:06

kiszk force-pushed the SPARK-19372 branch from a2f85cd to c5fc5f1 Compare March 11, 2017 04:58

viirya reviewed Mar 13, 2017

View reviewed changes

remove unused import

8b6ba75

zsxwing reviewed May 10, 2017

View reviewed changes

addressed review comments

1f19c80

zsxwing reviewed May 15, 2017

View reviewed changes

address review comment

3868bf5

address review comments

a5fd465

asfgit closed this in 6f62e9d May 16, 2017

kiszk mentioned this pull request May 26, 2017

[BACKPORT-2.2][SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit #18119

Closed

poplav mentioned this pull request Aug 14, 2017

[BACKPORT-2.1][SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit #18942

Closed

gatorsmile reviewed Aug 26, 2017

View reviewed changes

		@@ -32,6 +32,7 @@ import org.apache.spark.sql.types.LongType
		import org.apache.spark.util.ThreadUtils
		import org.apache.spark.util.random.{BernoulliCellSampler, PoissonSampler}

		}

		class InterpretedPredicate(expression: Expression) extends GenPredicate {

[SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit #17087

[SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit #17087

Conversation

kiszk commented Feb 27, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Feb 27, 2017

kiszk commented Mar 6, 2017

srowen commented Mar 6, 2017

kiszk commented Mar 6, 2017 • edited Loading

SparkQA commented Mar 6, 2017

marmbrus commented Mar 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kiszk commented Mar 8, 2017

marmbrus commented Mar 8, 2017

kiszk commented Mar 9, 2017

kiszk commented Mar 9, 2017

SparkQA commented Mar 9, 2017

marmbrus commented Mar 9, 2017

kiszk commented Mar 10, 2017

SparkQA commented Mar 10, 2017

SparkQA commented Mar 10, 2017

SparkQA commented Mar 10, 2017

SparkQA commented Mar 11, 2017

viirya Mar 13, 2017 • edited Loading

Choose a reason for hiding this comment

viirya Mar 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 13, 2017

kiszk commented Mar 13, 2017

SparkQA commented Mar 13, 2017

SparkQA commented Mar 13, 2017

SparkQA commented Apr 20, 2017

kiszk commented Apr 21, 2017

kiszk commented Apr 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zsxwing May 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zsxwing left a comment

Choose a reason for hiding this comment

SparkQA commented May 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented May 16, 2017

SparkQA commented May 16, 2017

zsxwing commented May 16, 2017

dongjoon-hyun commented May 26, 2017

poplav commented Aug 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kiszk commented Mar 6, 2017 •

edited

Loading

viirya Mar 13, 2017 •

edited

Loading

viirya Mar 13, 2017 •

edited

Loading

zsxwing May 10, 2017 •

edited

Loading