-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22668][SQL] Ensure no global variables in arguments of method split by CodegenContext.splitExpressions() #19865
Conversation
Test build #84386 has finished for PR 19865 at commit
|
@kiszk do you have any real example where this problem can happen now? If so, can we add an end-to-end test case? |
Thank you for your comment. This problem potentially exists regard less #19811. As we usually did, I added a test with end-to-end code generation regarding this. |
I see, my point is: I cannot understand how this can happen in the current codebase. Is there any place where this problem is present now? Because at the moment it looks to me more like a potential problem and not a real problem. Not sure if I was clear enough. |
Test build #84387 has finished for PR 19865 at commit
|
It would be good to hear opinions of others. |
cc @cloud-fan @viirya @maropu @gatorsmile for review |
If the parameters to |
We think that we normally would not pass a global variable. My question is: Did we guarantee it? And, is it possible to guarantee it in the future? |
I think we have to guarantee this. The case you are dealing with is a wrong situation that should be avoided. Indeed, in such a case, it is not even clear to me which is the right thing to do: I mean, which was the intention? Was it to use the global variable or was it to use the passed argument? Here you are assuming that the intention was to use the global variable, which actually is an interpretation I would question, since up to me the right thing to do is to use the passed variable as Java does on his own, because anybody creating this situation hopefully knows as Java works, thus he/she might also rely on that. |
Test build #84391 has finished for PR 19865 at commit
|
I would like to hear options from others, too. |
Test build #84401 has finished for PR 19865 at commit
|
Test build #84402 has finished for PR 19865 at commit
|
making a variable global need to be done manually(call |
@mgaido91 @viirya We see an assertion failure. Here is an evidence that we pass a global variable to arguments of split function. An value was declared as a global variable. Then, it is passed as As you said, how do we ensure that we do not pass a global variable?
|
I think for this case, shouldn't we fix it and not pass in a global variable into |
@cloud-fan I see. As I pointed out, there are several places to set a global variable In addition to that, I will add document to |
@kiszk I think that in the case you hit them, this might have also been done appositely and relying on the way Java behaves, ie. that it uses the local variable and the global one is not used there. It can also be something which has been designed like that. In this way instead you are forcing a behavior which is not the expected one. I totally agree with @cloud-fan, that we should fix the problem where they are created, if there is any. We can also decide that such situation should be avoided for clarity, and therefore we can change the point where you find this behavior to be present. I am neutral to that. But I disagree in creating a situation which is counterintuitive. |
I am neutral how to fix this problem in the current master. What I am saying from the beginning is that this problem does not only exist in #19811, but also potentially in the current master. I am happy to agree that we fix the invalid case in the current master. |
hopefully #19878 can fix the problem. |
Thank you. I think so for this case. |
…ables ## What changes were proposed in this pull request? It turns out that `HashExpression` can pass around some values via parameter when splitting codes into methods, to save some global variable slots. This can also prevent a weird case that global variable appears in parameter list, which is discovered by apache#19865 ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes apache#19878 from cloud-fan/minor.
Since other PRs address to reduce usage of global variables in several operations, this PR will address |
## What changes were proposed in this pull request? This PR accomplishes the following two items. 1. Reduce # of global variables from two to one 2. Make lifetime of global variable local within an operation Item 1. reduces # of constant pool entries in a Java class. Item 2. ensures that an variable is not passed to arguments in a method split by `CodegenContext.splitExpressions()`, which is addressed by apache#19865. ## How was this patch tested? Added new test into `ArithmeticExpressionSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#19899 from kiszk/SPARK-22704.
## What changes were proposed in this pull request? This PR accomplishes the following two items. 1. Reduce # of global variables from two to one for generated code of `Case` and `Coalesce` and remove global variables for generated code of `In`. 2. Make lifetime of global variable local within an operation Item 1. reduces # of constant pool entries in a Java class. Item 2. ensures that an variable is not passed to arguments in a method split by `CodegenContext.splitExpressions()`, which is addressed by apache#19865. ## How was this patch tested? Added new tests into `PredicateSuite`, `NullExpressionsSuite`, and `ConditionalExpressionSuite`. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#19901 from kiszk/SPARK-22705.
@cloud-fan unfortunately, #19878 did not fix this issue. #19937 will fix this issue. |
add test case
Test build #84687 has finished for PR 19865 at commit
|
Test build #84688 has finished for PR 19865 at commit
|
I confirmed that this failure does not occur after merging #19937 in my environment. |
…SortMergeJoin ## What changes were proposed in this pull request? This PR reduce the number of global mutable variables in generated code of `SortMergeJoin`. Before this PR, global mutable variables are used to extend lifetime of variables in the nested loop. This can be achieved by declaring variable at the outer most loop level where the variables are used. In the following example, `smj_value8`, `smj_value8`, and `smj_value9` are declared as local variable at lines 145-147 in `With this PR`. This PR fixes potential assertion error by #19865. Without this PR, a global mutable variable is potentially passed to arguments in generated code of split function. Without this PR ``` /* 010 */ int smj_value8; /* 011 */ boolean smj_value8; /* 012 */ int smj_value9; .. /* 143 */ protected void processNext() throws java.io.IOException { /* 144 */ while (findNextInnerJoinRows(smj_leftInput, smj_rightInput)) { /* 145 */ boolean smj_loaded = false; /* 146 */ smj_isNull6 = smj_leftRow.isNullAt(1); /* 147 */ smj_value9 = smj_isNull6 ? -1 : (smj_leftRow.getInt(1)); /* 148 */ scala.collection.Iterator<UnsafeRow> smj_iterator = smj_matches.generateIterator(); /* 149 */ while (smj_iterator.hasNext()) { /* 150 */ InternalRow smj_rightRow1 = (InternalRow) smj_iterator.next(); /* 151 */ boolean smj_isNull8 = smj_rightRow1.isNullAt(1); /* 152 */ int smj_value11 = smj_isNull8 ? -1 : (smj_rightRow1.getInt(1)); /* 153 */ /* 154 */ boolean smj_value12 = (smj_isNull6 && smj_isNull8) || /* 155 */ (!smj_isNull6 && !smj_isNull8 && smj_value9 == smj_value11); /* 156 */ if (false || !smj_value12) continue; /* 157 */ if (!smj_loaded) { /* 158 */ smj_loaded = true; /* 159 */ smj_value8 = smj_leftRow.getInt(0); /* 160 */ } /* 161 */ int smj_value10 = smj_rightRow1.getInt(0); /* 162 */ smj_numOutputRows.add(1); /* 163 */ /* 164 */ smj_rowWriter.zeroOutNullBytes(); /* 165 */ /* 166 */ smj_rowWriter.write(0, smj_value8); /* 167 */ /* 168 */ if (smj_isNull6) { /* 169 */ smj_rowWriter.setNullAt(1); /* 170 */ } else { /* 171 */ smj_rowWriter.write(1, smj_value9); /* 172 */ } /* 173 */ /* 174 */ smj_rowWriter.write(2, smj_value10); /* 175 */ /* 176 */ if (smj_isNull8) { /* 177 */ smj_rowWriter.setNullAt(3); /* 178 */ } else { /* 179 */ smj_rowWriter.write(3, smj_value11); /* 180 */ } /* 181 */ append(smj_result.copy()); /* 182 */ /* 183 */ } /* 184 */ if (shouldStop()) return; /* 185 */ } /* 186 */ } ``` With this PR ``` /* 143 */ protected void processNext() throws java.io.IOException { /* 144 */ while (findNextInnerJoinRows(smj_leftInput, smj_rightInput)) { /* 145 */ int smj_value8 = -1; /* 146 */ boolean smj_isNull6 = false; /* 147 */ int smj_value9 = -1; /* 148 */ boolean smj_loaded = false; /* 149 */ smj_isNull6 = smj_leftRow.isNullAt(1); /* 150 */ smj_value9 = smj_isNull6 ? -1 : (smj_leftRow.getInt(1)); /* 151 */ scala.collection.Iterator<UnsafeRow> smj_iterator = smj_matches.generateIterator(); /* 152 */ while (smj_iterator.hasNext()) { /* 153 */ InternalRow smj_rightRow1 = (InternalRow) smj_iterator.next(); /* 154 */ boolean smj_isNull8 = smj_rightRow1.isNullAt(1); /* 155 */ int smj_value11 = smj_isNull8 ? -1 : (smj_rightRow1.getInt(1)); /* 156 */ /* 157 */ boolean smj_value12 = (smj_isNull6 && smj_isNull8) || /* 158 */ (!smj_isNull6 && !smj_isNull8 && smj_value9 == smj_value11); /* 159 */ if (false || !smj_value12) continue; /* 160 */ if (!smj_loaded) { /* 161 */ smj_loaded = true; /* 162 */ smj_value8 = smj_leftRow.getInt(0); /* 163 */ } /* 164 */ int smj_value10 = smj_rightRow1.getInt(0); /* 165 */ smj_numOutputRows.add(1); /* 166 */ /* 167 */ smj_rowWriter.zeroOutNullBytes(); /* 168 */ /* 169 */ smj_rowWriter.write(0, smj_value8); /* 170 */ /* 171 */ if (smj_isNull6) { /* 172 */ smj_rowWriter.setNullAt(1); /* 173 */ } else { /* 174 */ smj_rowWriter.write(1, smj_value9); /* 175 */ } /* 176 */ /* 177 */ smj_rowWriter.write(2, smj_value10); /* 178 */ /* 179 */ if (smj_isNull8) { /* 180 */ smj_rowWriter.setNullAt(3); /* 181 */ } else { /* 182 */ smj_rowWriter.write(3, smj_value11); /* 183 */ } /* 184 */ append(smj_result.copy()); /* 185 */ /* 186 */ } /* 187 */ if (shouldStop()) return; /* 188 */ } /* 189 */ } ``` ## How was this patch tested? Existing test cases Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #19937 from kiszk/SPARK-22746.
@@ -842,7 +856,10 @@ class CodegenContext { | |||
blocks.head | |||
} else { | |||
val func = freshName(funcName) | |||
val argString = arguments.map { case (t, name) => s"$t $name" }.mkString(", ") | |||
val argString = arguments.map { case (t, name) => | |||
assert(!isDeclaredMutableState(name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm neutral about this pr though, I feel this is not the best place for this assertion cuz this is not only the place where the suggested case happens, e.g., my pr splits aggregate functions in HashAggregate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I did not know #19082 has own split routine. While it is under review, there is no way to add assertion by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we'd better to check arguments in all the registered functions via addFunction
if you check the case described in the description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this problem occurs not only in split method also in normal method. Let us check this in addNewFunction
/** | ||
* Return true if a given variable has been described as a global variable | ||
*/ | ||
def isDeclaredMutableState(varName: String): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's only enable this check in test environment, in case it has bugs and break production jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, this PR uses the method only in assert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, even only in assert
, it still can fail compilation, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking Utils.isTesting and throwing an exception in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it is a good way to do this at caller side since this function is valid at debug and production environments.
* Return true if a given variable has been described as a global variable | ||
*/ | ||
def isDeclaredMutableState(varName: String): Boolean = { | ||
val j = varName.indexOf("[") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to deal with []
here, using them as parameter name will fail to compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
def isDeclaredMutableState(varName: String): Boolean = { | ||
val j = varName.indexOf("[") | ||
val qualifiedName = if (j < 0) varName else varName.substring(0, j) | ||
mutableStates.find { s => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: find -> exists
What changes were proposed in this pull request?
This PR ensures that no global variables in arguments of method split by
CodegenContext.splitExpressions()
. This PR asserts this condition.If there are a global variables in split method by
CodegenContext.splitExpressions()
, the following problem may occur. When variables in arguments are declared as local variables, to assign a value to the variable, which has been originally declared as a global variable, updates a local variable.Each code generator has to ensure there is no such a case.
How was this patch tested?
Existing test suites