Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22668][SQL] Ensure no global variables in arguments of method split by CodegenContext.splitExpressions() #19865

Closed
wants to merge 2 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Dec 2, 2017

What changes were proposed in this pull request?

This PR ensures that no global variables in arguments of method split by CodegenContext.splitExpressions(). This PR asserts this condition.

If there are a global variables in split method by CodegenContext.splitExpressions() , the following problem may occur. When variables in arguments are declared as local variables, to assign a value to the variable, which has been originally declared as a global variable, updates a local variable.

Each code generator has to ensure there is no such a case.

class Test {
  int globalInt;

  void splittedFunction(int globalInt) {
    ...
    globalInt = 2;
  }

  void apply() {
    globalInt = 1;
    ...
    splittedFunction(globalInt);
    // globalInt should be 2 here since it is 2 if statements are not split
  }
}

How was this patch tested?

Existing test suites

@SparkQA
Copy link

SparkQA commented Dec 2, 2017

Test build #84386 has finished for PR 19865 at commit 3b25b5f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

mgaido91 commented Dec 2, 2017

@kiszk do you have any real example where this problem can happen now? If so, can we add an end-to-end test case?

@kiszk
Copy link
Member Author

kiszk commented Dec 2, 2017

@mgaido91 Since I realized this issue through the work of #19811, I do not have an real example now.

@mgaido91
Copy link
Contributor

mgaido91 commented Dec 2, 2017

thanks for your answer @kiszk . Maybe I have not understood correctly, thus I apologize in advance if my guessing is wrong, but if this is a problem introduced by #19811, why aren't we fixing it there?

@kiszk
Copy link
Member Author

kiszk commented Dec 2, 2017

Thank you for your comment. This problem potentially exists regard less #19811. As we usually did, I added a test with end-to-end code generation regarding this.

@mgaido91
Copy link
Contributor

mgaido91 commented Dec 2, 2017

I see, my point is: I cannot understand how this can happen in the current codebase. Is there any place where this problem is present now? Because at the moment it looks to me more like a potential problem and not a real problem. Not sure if I was clear enough.

@SparkQA
Copy link

SparkQA commented Dec 2, 2017

Test build #84387 has finished for PR 19865 at commit cf796fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class CodeGenSuiteTest extends $
  • abstract class CodeGenSuiteAbstract

@kiszk
Copy link
Member Author

kiszk commented Dec 2, 2017

It would be good to hear opinions of others.

@kiszk
Copy link
Member Author

kiszk commented Dec 2, 2017

cc @cloud-fan @viirya @maropu @gatorsmile for review

@viirya
Copy link
Member

viirya commented Dec 2, 2017

If the parameters to splitExpressions are given by us, I think normally we won't pass in a global variable?

@kiszk
Copy link
Member Author

kiszk commented Dec 2, 2017

We think that we normally would not pass a global variable. My question is: Did we guarantee it? And, is it possible to guarantee it in the future?
If so, how about adding a comment and inserting assertion to express our contract?

@mgaido91
Copy link
Contributor

mgaido91 commented Dec 2, 2017

I think we have to guarantee this. The case you are dealing with is a wrong situation that should be avoided. Indeed, in such a case, it is not even clear to me which is the right thing to do: I mean, which was the intention? Was it to use the global variable or was it to use the passed argument?

Here you are assuming that the intention was to use the global variable, which actually is an interpretation I would question, since up to me the right thing to do is to use the passed variable as Java does on his own, because anybody creating this situation hopefully knows as Java works, thus he/she might also rely on that.

@SparkQA
Copy link

SparkQA commented Dec 2, 2017

Test build #84391 has finished for PR 19865 at commit d04544c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Dec 3, 2017

I would like to hear options from others, too.
WDYT? @cloud-fan @viirya @maropu @gatorsmile

@SparkQA
Copy link

SparkQA commented Dec 3, 2017

Test build #84401 has finished for PR 19865 at commit ad86986.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2017

Test build #84402 has finished for PR 19865 at commit f63cd3a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 3, 2017

making a variable global need to be done manually(call ctx.mutableState), splitting the code into methods also need to be done manually(call ctx.splitExpressions). If we hit a problem here, it's probably due to misuse. Can we just check the codebase, find these invalid cases and fix them? We may probably add document to ctx.splitExpression that global variables should not be in the parameter list.

@kiszk
Copy link
Member Author

kiszk commented Dec 3, 2017

@mgaido91 @viirya We see an assertion failure. Here is an evidence that we pass a global variable to arguments of split function.
In practice, we did not guarantee that we do not pass a global variable.

An value was declared as a global variable. Then, it is passed as ExprCode.value. Finally, value is passed as an argument in CodeGenContext.splitFunction. Fortunally, this expressions did not update the global variable. Thus, it worked fuctionally correct.
In general, it is hard to ensure there is no update in the expressions. Of course, we do not like to use regular expressions to detect it.

As you said, how do we ensure that we do not pass a global variable?

**********************************************************************
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/feature.py", line 1205, in __main__.MinHashLSH
Failed example:
...
    Caused by: java.lang.AssertionError: assertion failed: smj_value16 in arguments should not be declared as a global variable
    	at scala.Predef$.assert(Predef.scala:170)
    	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.org$apache$spark$sql$catalyst$expressions$codegen$CodegenContext$$isDeclaredMutableState(CodeGenerator.scala:226)
    	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext$$anonfun$9.apply(CodeGenerator.scala:854)
    	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext$$anonfun$9.apply(CodeGenerator.scala:854)
    	at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
    	at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
    	at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
    	at org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.splitExpressions(CodeGenerator.scala:853)
    	at org.apache.spark.sql.catalyst.expressions.HashExpression.genHashForStruct(hash.scala:395)
    	at org.apache.spark.sql.catalyst.expressions.HashExpression.computeHashWithTailRec(hash.scala:421)
    	at org.apache.spark.sql.catalyst.expressions.HashExpression.computeHash(hash.scala:429)
    	at org.apache.spark.sql.catalyst.expressions.HashExpression$$anonfun$1.apply(hash.scala:276)
    	at org.apache.spark.sql.catalyst.expressions.HashExpression$$anonfun$1.apply(hash.scala:273)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    	at org.apache.spark.sql.catalyst.expressions.HashExpression.doGenCode(hash.scala:273)
    	at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:107)
    	at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
    	at scala.Option.getOrElse(Option.scala:121)
    	at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:104)
    	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithKeys(HashAggregateExec.scala:772)
    	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsume(HashAggregateExec.scala:173)
    	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:162)
    	at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:35)
    	at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:65)
    	at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:162)
    	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:36)
    	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:626)
    	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
    	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
    	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
    	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.produce(SortMergeJoinExec.scala:36)
    	at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
    	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
    	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
    	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
    	at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
    	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:647)
    	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:165)
    	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:85)
    	at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:80)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
    	at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:80)
    	at org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:39)
    	at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:374)
    	at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:422)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:113)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
    	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
    	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:89)
    	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:125)
    	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:116)
    	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
...```

@viirya
Copy link
Member

viirya commented Dec 3, 2017

I think for this case, shouldn't we fix it and not pass in a global variable into splitExpressions?

@kiszk
Copy link
Member Author

kiszk commented Dec 3, 2017

@cloud-fan I see. As I pointed out, there are several places to set a global variable ExprCode.value that is passed to successor operations.
Should we make lifetime of global time local in an operation?

In addition to that, I will add document to ctx.splitExpression and insert assertion.

@mgaido91
Copy link
Contributor

mgaido91 commented Dec 3, 2017

@kiszk I think that in the case you hit them, this might have also been done appositely and relying on the way Java behaves, ie. that it uses the local variable and the global one is not used there. It can also be something which has been designed like that. In this way instead you are forcing a behavior which is not the expected one.

I totally agree with @cloud-fan, that we should fix the problem where they are created, if there is any. We can also decide that such situation should be avoided for clarity, and therefore we can change the point where you find this behavior to be present. I am neutral to that. But I disagree in creating a situation which is counterintuitive.

@kiszk
Copy link
Member Author

kiszk commented Dec 3, 2017

I am neutral how to fix this problem in the current master. What I am saying from the beginning is that this problem does not only exist in #19811, but also potentially in the current master.

I am happy to agree that we fix the invalid case in the current master.

@kiszk kiszk changed the title [SPARK-22668][SQL] Exclude global variables from arguments of method split by CodegenContext.splitExpressions() [SPARK-22668][SQL] Do not pass global variables to arguments of method split by CodegenContext.splitExpressions() Dec 4, 2017
@cloud-fan
Copy link
Contributor

hopefully #19878 can fix the problem.

@kiszk
Copy link
Member Author

kiszk commented Dec 4, 2017

Thank you. I think so for this case.
In general, to make global ev.value or to pass a global variable to consume() may potentially cause this problem.
As @cloud-fan suggested, I identified these patterns and am fixing them.

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 5, 2017
…ables

## What changes were proposed in this pull request?

It turns out that `HashExpression` can pass around some values via parameter when splitting codes into methods, to save some global variable slots.

This can also prevent a weird case that global variable appears in parameter list, which is discovered by apache#19865

## How was this patch tested?

existing tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#19878 from cloud-fan/minor.
@kiszk
Copy link
Member Author

kiszk commented Dec 5, 2017

Since other PRs address to reduce usage of global variables in several operations, this PR will address SortMergeJoinExec and HashAggregateExec.

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 6, 2017
## What changes were proposed in this pull request?

This PR accomplishes the following two items.

1. Reduce # of global variables from two to one
2. Make lifetime of global variable local within an operation

Item 1. reduces # of constant pool entries in a Java class. Item 2. ensures that an variable is not passed to arguments in a method split by `CodegenContext.splitExpressions()`, which is addressed by apache#19865.

## How was this patch tested?

Added new test into `ArithmeticExpressionSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19899 from kiszk/SPARK-22704.
ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 7, 2017
## What changes were proposed in this pull request?

This PR accomplishes the following two items.

1. Reduce # of global variables from two to one for generated code of `Case` and `Coalesce` and remove global variables for generated code of `In`.
2. Make lifetime of global variable local within an operation

Item 1. reduces # of constant pool entries in a Java class. Item 2. ensures that an variable is not passed to arguments in a method split by `CodegenContext.splitExpressions()`, which is addressed by apache#19865.

## How was this patch tested?

Added new tests into `PredicateSuite`, `NullExpressionsSuite`, and `ConditionalExpressionSuite`.

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19901 from kiszk/SPARK-22705.
@kiszk kiszk changed the title [SPARK-22668][SQL] Do not pass global variables to arguments of method split by CodegenContext.splitExpressions() [SPARK-22668][SQL] Assert to ensure no global variables in arguments of method split by CodegenContext.splitExpressions() Dec 9, 2017
@kiszk
Copy link
Member Author

kiszk commented Dec 9, 2017

@cloud-fan unfortunately, #19878 did not fix this issue. #19937 will fix this issue.

@kiszk kiszk changed the title [SPARK-22668][SQL] Assert to ensure no global variables in arguments of method split by CodegenContext.splitExpressions() [SPARK-22668][SQL] Ensure no global variables in arguments of method split by CodegenContext.splitExpressions() Dec 9, 2017
add test case
@SparkQA
Copy link

SparkQA commented Dec 9, 2017

Test build #84687 has finished for PR 19865 at commit 4e83f9f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 9, 2017

Test build #84688 has finished for PR 19865 at commit c37311f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Dec 10, 2017

I confirmed that this failure does not occur after merging #19937 in my environment.

asfgit pushed a commit that referenced this pull request Dec 11, 2017
…SortMergeJoin

## What changes were proposed in this pull request?

This PR reduce the number of global mutable variables in generated code of `SortMergeJoin`.

Before this PR, global mutable variables are used to extend lifetime of variables in the nested loop. This can be achieved by declaring variable at the outer most loop level where the variables are used.
In the following example, `smj_value8`, `smj_value8`, and `smj_value9` are declared as local variable at lines 145-147 in `With this PR`.

This PR fixes potential assertion error by #19865. Without this PR, a global mutable variable is potentially passed to arguments in generated code of split function.

Without this PR
```
/* 010 */   int smj_value8;
/* 011 */   boolean smj_value8;
/* 012 */   int smj_value9;
..
/* 143 */   protected void processNext() throws java.io.IOException {
/* 144 */     while (findNextInnerJoinRows(smj_leftInput, smj_rightInput)) {
/* 145 */       boolean smj_loaded = false;
/* 146 */       smj_isNull6 = smj_leftRow.isNullAt(1);
/* 147 */       smj_value9 = smj_isNull6 ? -1 : (smj_leftRow.getInt(1));
/* 148 */       scala.collection.Iterator<UnsafeRow> smj_iterator = smj_matches.generateIterator();
/* 149 */       while (smj_iterator.hasNext()) {
/* 150 */         InternalRow smj_rightRow1 = (InternalRow) smj_iterator.next();
/* 151 */         boolean smj_isNull8 = smj_rightRow1.isNullAt(1);
/* 152 */         int smj_value11 = smj_isNull8 ? -1 : (smj_rightRow1.getInt(1));
/* 153 */
/* 154 */         boolean smj_value12 = (smj_isNull6 && smj_isNull8) ||
/* 155 */         (!smj_isNull6 && !smj_isNull8 && smj_value9 == smj_value11);
/* 156 */         if (false || !smj_value12) continue;
/* 157 */         if (!smj_loaded) {
/* 158 */           smj_loaded = true;
/* 159 */           smj_value8 = smj_leftRow.getInt(0);
/* 160 */         }
/* 161 */         int smj_value10 = smj_rightRow1.getInt(0);
/* 162 */         smj_numOutputRows.add(1);
/* 163 */
/* 164 */         smj_rowWriter.zeroOutNullBytes();
/* 165 */
/* 166 */         smj_rowWriter.write(0, smj_value8);
/* 167 */
/* 168 */         if (smj_isNull6) {
/* 169 */           smj_rowWriter.setNullAt(1);
/* 170 */         } else {
/* 171 */           smj_rowWriter.write(1, smj_value9);
/* 172 */         }
/* 173 */
/* 174 */         smj_rowWriter.write(2, smj_value10);
/* 175 */
/* 176 */         if (smj_isNull8) {
/* 177 */           smj_rowWriter.setNullAt(3);
/* 178 */         } else {
/* 179 */           smj_rowWriter.write(3, smj_value11);
/* 180 */         }
/* 181 */         append(smj_result.copy());
/* 182 */
/* 183 */       }
/* 184 */       if (shouldStop()) return;
/* 185 */     }
/* 186 */   }
```

With this PR
```
/* 143 */   protected void processNext() throws java.io.IOException {
/* 144 */     while (findNextInnerJoinRows(smj_leftInput, smj_rightInput)) {
/* 145 */       int smj_value8 = -1;
/* 146 */       boolean smj_isNull6 = false;
/* 147 */       int smj_value9 = -1;
/* 148 */       boolean smj_loaded = false;
/* 149 */       smj_isNull6 = smj_leftRow.isNullAt(1);
/* 150 */       smj_value9 = smj_isNull6 ? -1 : (smj_leftRow.getInt(1));
/* 151 */       scala.collection.Iterator<UnsafeRow> smj_iterator = smj_matches.generateIterator();
/* 152 */       while (smj_iterator.hasNext()) {
/* 153 */         InternalRow smj_rightRow1 = (InternalRow) smj_iterator.next();
/* 154 */         boolean smj_isNull8 = smj_rightRow1.isNullAt(1);
/* 155 */         int smj_value11 = smj_isNull8 ? -1 : (smj_rightRow1.getInt(1));
/* 156 */
/* 157 */         boolean smj_value12 = (smj_isNull6 && smj_isNull8) ||
/* 158 */         (!smj_isNull6 && !smj_isNull8 && smj_value9 == smj_value11);
/* 159 */         if (false || !smj_value12) continue;
/* 160 */         if (!smj_loaded) {
/* 161 */           smj_loaded = true;
/* 162 */           smj_value8 = smj_leftRow.getInt(0);
/* 163 */         }
/* 164 */         int smj_value10 = smj_rightRow1.getInt(0);
/* 165 */         smj_numOutputRows.add(1);
/* 166 */
/* 167 */         smj_rowWriter.zeroOutNullBytes();
/* 168 */
/* 169 */         smj_rowWriter.write(0, smj_value8);
/* 170 */
/* 171 */         if (smj_isNull6) {
/* 172 */           smj_rowWriter.setNullAt(1);
/* 173 */         } else {
/* 174 */           smj_rowWriter.write(1, smj_value9);
/* 175 */         }
/* 176 */
/* 177 */         smj_rowWriter.write(2, smj_value10);
/* 178 */
/* 179 */         if (smj_isNull8) {
/* 180 */           smj_rowWriter.setNullAt(3);
/* 181 */         } else {
/* 182 */           smj_rowWriter.write(3, smj_value11);
/* 183 */         }
/* 184 */         append(smj_result.copy());
/* 185 */
/* 186 */       }
/* 187 */       if (shouldStop()) return;
/* 188 */     }
/* 189 */   }
```

## How was this patch tested?

Existing test cases

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19937 from kiszk/SPARK-22746.
@@ -842,7 +856,10 @@ class CodegenContext {
blocks.head
} else {
val func = freshName(funcName)
val argString = arguments.map { case (t, name) => s"$t $name" }.mkString(", ")
val argString = arguments.map { case (t, name) =>
assert(!isDeclaredMutableState(name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm neutral about this pr though, I feel this is not the best place for this assertion cuz this is not only the place where the suggested case happens, e.g., my pr splits aggregate functions in HashAggregate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I did not know #19082 has own split routine. While it is under review, there is no way to add assertion by this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we'd better to check arguments in all the registered functions via addFunction if you check the case described in the description?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this problem occurs not only in split method also in normal method. Let us check this in addNewFunction

/**
* Return true if a given variable has been described as a global variable
*/
def isDeclaredMutableState(varName: String): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's only enable this check in test environment, in case it has bugs and break production jobs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this PR uses the method only in assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, even only in assert, it still can fail compilation, doesn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking Utils.isTesting and throwing an exception in tests?

Copy link
Member Author

@kiszk kiszk Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is a good way to do this at caller side since this function is valid at debug and production environments.

* Return true if a given variable has been described as a global variable
*/
def isDeclaredMutableState(varName: String): Boolean = {
val j = varName.indexOf("[")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to deal with [] here, using them as parameter name will fail to compile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

def isDeclaredMutableState(varName: String): Boolean = {
val j = varName.indexOf("[")
val qualifiedName = if (j < 0) varName else varName.substring(0, j)
mutableStates.find { s =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: find -> exists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants