Skip to content

Commit

Permalink
[SPARK-29777][SPARKR] SparkR::cleanClosure aggressively removes a fun…
Browse files Browse the repository at this point in the history
…ction required by user function

### What changes were proposed in this pull request?
The implementation for walking through the user function AST and picking referenced variables and functions, had an optimization to skip a branch if it had already seen it. This runs into an interesting problem in the following example

```
df <- createDataFrame(data.frame(x=1))
f1 <- function(x) x + 1
f2 <- function(x) f1(x) + 2
dapplyCollect(df, function(x) { f1(x); f2(x) })
```
Results in error:
```
org.apache.spark.SparkException: R computation failed with
 Error in f1(x) : could not find function "f1"
Calls: compute -> computeFunc -> f2
```

### Why are the changes needed?
Bug fix

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Unit tests in `test_utils.R`

Closes #26429 from falaki/SPARK-29777.

Authored-by: Hossein <hossein@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
  • Loading branch information
falaki authored and HyukjinKwon committed Nov 19, 2019
1 parent ea010a2 commit 9514b82
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 2 deletions.
8 changes: 6 additions & 2 deletions R/pkg/R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -543,10 +543,14 @@ processClosure <- function(node, oldEnv, defVars, checkedFuncs, newEnv) {
funcList <- mget(nodeChar, envir = checkedFuncs, inherits = F,
ifnotfound = list(list(NULL)))[[1]]
found <- sapply(funcList, function(func) {
ifelse(identical(func, obj), TRUE, FALSE)
ifelse(
identical(func, obj) &&
# Also check if the parent environment is identical to current parent
identical(parent.env(environment(func)), func.env),
TRUE, FALSE)
})
if (sum(found) > 0) {
# If function has been examined, ignore.
# If function has been examined ignore
break
}
# Function has not been examined, record it and recursively clean its closure.
Expand Down
9 changes: 9 additions & 0 deletions R/pkg/tests/fulltests/test_utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,15 @@ test_that("cleanClosure on R functions", {
actual <- get("y", envir = env, inherits = FALSE)
expect_equal(actual, y)

# Test for combination for nested and sequenctial functions in a closure
f1 <- function(x) x + 1
f2 <- function(x) f1(x) + 2
userFunc <- function(x) { f1(x); f2(x) }
cUserFuncEnv <- environment(cleanClosure(userFunc))
expect_equal(length(cUserFuncEnv), 2)
innerCUserFuncEnv <- environment(cUserFuncEnv$f2)
expect_equal(length(innerCUserFuncEnv), 1)

# Test for function (and variable) definitions.
f <- function(x) {
g <- function(y) { y * 2 }
Expand Down

0 comments on commit 9514b82

Please sign in to comment.