Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22668][SQL] Ensure no global variables in arguments of method split by CodegenContext.splitExpressions() #19865

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,18 @@ class CodegenContext {
splitExpressions(expressions = initCodes, funcName = "init", arguments = Nil)
}

/**
* Return true if a given variable has been described as a global variable
*/
def isDeclaredMutableState(varName: String): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's only enable this check in test environment, in case it has bugs and break production jobs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this PR uses the method only in assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, even only in assert, it still can fail compilation, doesn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking Utils.isTesting and throwing an exception in tests?

Copy link
Member Author

@kiszk kiszk Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is a good way to do this at caller side since this function is valid at debug and production environments.

val j = varName.indexOf("[")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to deal with [] here, using them as parameter name will fail to compile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

val qualifiedName = if (j < 0) varName else varName.substring(0, j)
mutableStates.find { s =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: find -> exists

val i = s._2.indexOf("[")
qualifiedName == (if (i < 0) s._2 else s._2.substring(0, i))
}.isDefined
}

/**
* Code statements to initialize states that depend on the partition index.
* An integer `partitionIndex` will be made available within the scope.
Expand Down Expand Up @@ -789,7 +801,8 @@ class CodegenContext {
* @param expressions the codes to evaluate expressions.
* @param funcName the split function name base.
* @param extraArguments the list of (type, name) of the arguments of the split function,
* except for the current inputs like `ctx.INPUT_ROW`.
* except for the current inputs like `ctx.INPUT_ROW`. Name must not be
* mutable state.
* @param returnType the return type of the split function.
* @param makeSplitFunction makes split function body, e.g. add preparation or cleanup.
* @param foldFunctions folds the split function calls.
Expand Down Expand Up @@ -823,7 +836,8 @@ class CodegenContext {
*
* @param expressions the codes to evaluate expressions.
* @param funcName the split function name base.
* @param arguments the list of (type, name) of the arguments of the split function.
* @param arguments the list of (type, name) of the arguments of the split function. Name must
* not be mutable state
* @param returnType the return type of the split function.
* @param makeSplitFunction makes split function body, e.g. add preparation or cleanup.
* @param foldFunctions folds the split function calls.
Expand All @@ -842,7 +856,10 @@ class CodegenContext {
blocks.head
} else {
val func = freshName(funcName)
val argString = arguments.map { case (t, name) => s"$t $name" }.mkString(", ")
val argString = arguments.map { case (t, name) =>
assert(!isDeclaredMutableState(name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm neutral about this pr though, I feel this is not the best place for this assertion cuz this is not only the place where the suggested case happens, e.g., my pr splits aggregate functions in HashAggregate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I did not know #19082 has own split routine. While it is under review, there is no way to add assertion by this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we'd better to check arguments in all the registered functions via addFunction if you check the case described in the description?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this problem occurs not only in split method also in normal method. Let us check this in addNewFunction

s"$name in arguments should not be declared as a global variable")
s"$t $name" }.mkString(", ")
val functions = blocks.zipWithIndex.map { case (body, i) =>
val name = s"${func}_$i"
val code = s"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -394,4 +394,23 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
Map("add" -> Literal(1))).genCode(ctx)
assert(ctx.mutableStates.isEmpty)
}

test("SPARK-22668: ensure no global variables in split method arguments") {
val ctx = new CodegenContext
ctx.addMutableState(ctx.JAVA_INT, "ij")
ctx.addMutableState("int[]", "array")
ctx.addMutableState("int[][]", "b")

assert(ctx.isDeclaredMutableState("ij"))
assert(ctx.isDeclaredMutableState("array"))
assert(ctx.isDeclaredMutableState("array[1]"))
assert(ctx.isDeclaredMutableState("b[]"))
assert(ctx.isDeclaredMutableState("b[1][]"))

assert(!ctx.isDeclaredMutableState("i"))
assert(!ctx.isDeclaredMutableState("j"))
assert(!ctx.isDeclaredMutableState("ij1"))
assert(!ctx.isDeclaredMutableState("arr"))
assert(!ctx.isDeclaredMutableState("bb[]"))
}
}