-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21622][ML][SparkR] Support offset in SparkR GLM #18831
Conversation
Test build #80194 has finished for PR 18831 at commit
|
Jenkins, retest this please |
Test build #80213 has finished for PR 18831 at commit
|
R/pkg/R/mllib_regression.R
Outdated
@@ -125,7 +127,7 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj")) | |||
#' @seealso \link{glm}, \link{read.ml} | |||
setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), | |||
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL, | |||
regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power, | |||
offsetCol = NULL, regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid adding a param in the middle - it breaks code passing param by order
R/pkg/R/mllib_regression.R
Outdated
offsetCol <- NULL | ||
} else if (!is.null(offsetCol)) { | ||
offsetCol <- as.character(offsetCol) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps
if (!is.null(offsetCol)) {
offsetCol <- as.character(offsetCol)
if (nchar(offsetCol) == 0) {
offsetCol <- NULL
}
}
not sure if you want to cover other cases when offsetCol cannot be coerced - eg. NA
Test build #80218 has finished for PR 18831 at commit
|
Test build #80219 has finished for PR 18831 at commit
|
Thanks for your comments, Felix. |
stats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, | ||
family = poisson(), offsetCol = "Petal_Length")) | ||
rStats <- suppressWarnings(summary(glm(Sepal.Width ~ Sepal.Length + Species, | ||
data = iris, family = poisson(), offset = iris$Petal.Length))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's interesting - perhaps we should take col
in addition to col name
too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then do you want to make the change for weight as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably across every in ml.
let's discuss this in a new JIRA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote to keep the name as it is, because it's the column name of offset
rather than the offset
itself. weightCol
is the same. We would like to keep SparkR MLlib wrappers' argument name consistent with R only when it's applicable. I'm ok to create a new JIRA to discuss it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
To be clear, I'm not suggesting to rename the parameter. I'm suggest we should support the type being passed in as column like df$myoffset in addition to it being a string. This will be more R like
|
@felixcheung Sorry for misunderstand, I agree we can support |
Thanks both of you for the comments. Yes, I think it's best to keep this PR on offset and we can address the other improvements later. |
merged to master |
What changes were proposed in this pull request?
Support offset in SparkR GLM #16699