-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17658][SPARKR] read.df/write.df API taking path optionally in SparkR #15231
Changes from all commits
41611e5
2d76e7c
c2a64db
5c3d222
1440195
07dca5d
11ae832
37480be
e6aeac3
05cb5af
bcd5060
08bdc4d
c8a433b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -698,6 +698,58 @@ isSparkRShell <- function() { | |
grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE) | ||
} | ||
|
||
# Works identically with `callJStatic(...)` but throws a pretty formatted exception. | ||
handledCallJStatic <- function(cls, method, ...) { | ||
result <- tryCatch(callJStatic(cls, method, ...), | ||
error = function(e) { | ||
captureJVMException(e, method) | ||
}) | ||
result | ||
} | ||
|
||
# Works identically with `callJMethod(...)` but throws a pretty formatted exception. | ||
handledCallJMethod <- function(obj, method, ...) { | ||
result <- tryCatch(callJMethod(obj, method, ...), | ||
error = function(e) { | ||
captureJVMException(e, method) | ||
}) | ||
result | ||
} | ||
|
||
captureJVMException <- function(e, method) { | ||
rawmsg <- as.character(e) | ||
if (any(grep("^Error in .*?: ", rawmsg))) { | ||
# If the exception message starts with "Error in ...", this is possibly | ||
# "Error in invokeJava(...)". Here, it replaces the characters to | ||
# `paste("Error in", method, ":")` in order to identify which function | ||
# was called in JVM side. | ||
stacktrace <- strsplit(rawmsg, "Error in .*?: ")[[1]] | ||
rmsg <- paste("Error in", method, ":") | ||
stacktrace <- paste(rmsg[1], stacktrace[2]) | ||
} else { | ||
# Otherwise, do not convert the error message just in case. | ||
stacktrace <- rawmsg | ||
} | ||
|
||
if (any(grep("java.lang.IllegalArgumentException: ", stacktrace))) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are there cases where the IllegalArgument should be checked on the R side first to avoid the exception in the first place? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! @felixcheung I will address all other comments above. However, for this one, I was thinking hard but it seems not easy because we won't know if given data source is valid or not in R side first. I might be able to do this only for internal data sources or known databricks datasources such as "redshift" or "xml" like.. creating a map for our internal data sources and then checking a path is given or not. However, I am not sure if it is a good idea to manage another list for datasources. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, I don't think we should couple the R code to the underlining data source implementations, and was not suggesting that :) I guess I'm saying there are still many (other) cases where the parameters are unchecked and would be good to see if this check to convert JVM IllegalArgumentException is sufficient or more checks should be added to the R side. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see. Yeap. This might be about best effort thing. I think I tried (if I am right) all combinations of parameters mssing/wrong in the APIs. One exceptional case for both APIs is, they throw an exception, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. great, thanks - generally I'd prefer having parameter checks in R; though in this case I think we need balance the added code complicity and reduced usability (by checking more, it might fail where it didn't before). so I'm not 100% sure we should add parameter checks all across the board. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeap, I do understand and will investigate it with keeping this in mind :) |
||
msg <- strsplit(stacktrace, "java.lang.IllegalArgumentException: ", fixed = TRUE)[[1]] | ||
# Extract "Error in ..." message. | ||
rmsg <- msg[1] | ||
# Extract the first message of JVM exception. | ||
first <- strsplit(msg[2], "\r?\n\tat")[[1]][1] | ||
stop(paste0(rmsg, "illegal argument - ", first), call. = FALSE) | ||
} else if (any(grep("org.apache.spark.sql.AnalysisException: ", stacktrace))) { | ||
msg <- strsplit(stacktrace, "org.apache.spark.sql.AnalysisException: ", fixed = TRUE)[[1]] | ||
# Extract "Error in ..." message. | ||
rmsg <- msg[1] | ||
# Extract the first message of JVM exception. | ||
first <- strsplit(msg[2], "\r?\n\tat")[[1]][1] | ||
stop(paste0(rmsg, "analysis error - ", first), call. = FALSE) | ||
} else { | ||
stop(stacktrace, call. = FALSE) | ||
} | ||
} | ||
|
||
# rbind a list of rows with raw (binary) columns | ||
# | ||
# @param inputData a list of rows, with each row a list | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -166,6 +166,16 @@ test_that("convertToJSaveMode", { | |
'mode should be one of "append", "overwrite", "error", "ignore"') #nolint | ||
}) | ||
|
||
test_that("captureJVMException", { | ||
method <- "getSQLDataType" | ||
expect_error(tryCatch(callJStatic("org.apache.spark.sql.api.r.SQLUtils", method, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's change this test to |
||
"unknown"), | ||
error = function(e) { | ||
captureJVMException(e, method) | ||
}), | ||
"Error in getSQLDataType : illegal argument - Invalid type unknown") | ||
}) | ||
|
||
test_that("hashCode", { | ||
expect_error(hashCode("bc53d3605e8a5b7de1e8e271c2317645"), NA) | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor nit: you could probably replace the double pass with grep above and strsplit with just the result from strsplit