[SPARK-21622][ML][SparkR] Support offset in SparkR GLM #18831

actuaryzhang · 2017-08-03T06:42:01Z

What changes were proposed in this pull request?

Support offset in SparkR GLM #16699

SparkQA · 2017-08-03T07:04:51Z

Test build #80194 has finished for PR 18831 at commit 6ec068e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

actuaryzhang · 2017-08-03T15:55:05Z

Jenkins, retest this please

SparkQA · 2017-08-03T17:06:27Z

Test build #80213 has finished for PR 18831 at commit 6ec068e.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-08-03T17:32:22Z

R/pkg/R/mllib_regression.R

@@ -125,7 +127,7 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
 #' @seealso \link{glm}, \link{read.ml}
 setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
          function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
-                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
+                   offsetCol = NULL, regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,


I'd avoid adding a param in the middle - it breaks code passing param by order

felixcheung · 2017-08-03T17:40:54Z

R/pkg/R/mllib_regression.R

+              offsetCol <- NULL
+            } else if (!is.null(offsetCol)) {
+              offsetCol <- as.character(offsetCol)
+            }


perhaps

if (!is.null(offsetCol)) { offsetCol <- as.character(offsetCol) if (nchar(offsetCol) == 0) { offsetCol <- NULL } }

not sure if you want to cover other cases when offsetCol cannot be coerced - eg. NA

SparkQA · 2017-08-03T21:49:53Z

Test build #80218 has finished for PR 18831 at commit dc8ccbc.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-03T23:57:59Z

Test build #80219 has finished for PR 18831 at commit 3c4ebf9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

actuaryzhang · 2017-08-04T00:06:07Z

Thanks for your comments, Felix.
Addressed all issues.
@yanboliang Could you take a quick look?

felixcheung · 2017-08-04T05:19:46Z

R/pkg/tests/fulltests/test_mllib_regression.R

+  stats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species,
+                             family = poisson(), offsetCol = "Petal_Length"))
+  rStats <- suppressWarnings(summary(glm(Sepal.Width ~ Sepal.Length + Species,
+                        data = iris, family = poisson(), offset = iris$Petal.Length)))


that's interesting - perhaps we should take col in addition to col name too

Then do you want to make the change for weight as well?

probably across every in ml.
let's discuss this in a new JIRA.

I vote to keep the name as it is, because it's the column name of offset rather than the offset itself. weightCol is the same. We would like to keep SparkR MLlib wrappers' argument name consistent with R only when it's applicable. I'm ok to create a new JIRA to discuss it. Thanks.

yanboliang

LGTM

felixcheung · 2017-08-05T16:20:01Z

To be clear, I'm not suggesting to rename the parameter. I'm suggest we should support the type being passed in as column like df$myoffset in addition to it being a string. This will be more R like

yanboliang · 2017-08-05T16:56:36Z

@felixcheung Sorry for misunderstand, I agree we can support df$myoffset as well, the requirement make sense for R users. Let's create a separate JIRA to track it and do this change for other similar arguments like weightCol as well. Thanks.

actuaryzhang · 2017-08-06T04:00:51Z

Thanks both of you for the comments. Yes, I think it's best to keep this PR on offset and we can address the other improvements later.

felixcheung · 2017-08-06T22:14:48Z

merged to master

add offset to SparkR

6ec068e

felixcheung reviewed Aug 3, 2017

View reviewed changes

fix unit test

dc8ccbc

fix unit test

3c4ebf9

felixcheung reviewed Aug 4, 2017

View reviewed changes

yanboliang approved these changes Aug 5, 2017

View reviewed changes

felixcheung approved these changes Aug 6, 2017

View reviewed changes

asfgit closed this in 55aa4da Aug 6, 2017

actuaryzhang deleted the sparkROffset branch August 7, 2017 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21622][ML][SparkR] Support offset in SparkR GLM #18831

[SPARK-21622][ML][SparkR] Support offset in SparkR GLM #18831

actuaryzhang commented Aug 3, 2017

SparkQA commented Aug 3, 2017

actuaryzhang commented Aug 3, 2017

SparkQA commented Aug 3, 2017

felixcheung Aug 3, 2017

felixcheung Aug 3, 2017

SparkQA commented Aug 3, 2017

SparkQA commented Aug 3, 2017

actuaryzhang commented Aug 4, 2017

felixcheung Aug 4, 2017

actuaryzhang Aug 4, 2017

felixcheung Aug 5, 2017

yanboliang Aug 5, 2017 •

edited

Loading

yanboliang left a comment

felixcheung commented Aug 5, 2017 via email

yanboliang commented Aug 5, 2017

actuaryzhang commented Aug 6, 2017 •

edited

Loading

felixcheung commented Aug 6, 2017

[SPARK-21622][ML][SparkR] Support offset in SparkR GLM #18831

[SPARK-21622][ML][SparkR] Support offset in SparkR GLM #18831

Conversation

actuaryzhang commented Aug 3, 2017

What changes were proposed in this pull request?

SparkQA commented Aug 3, 2017

actuaryzhang commented Aug 3, 2017

SparkQA commented Aug 3, 2017

felixcheung Aug 3, 2017

Choose a reason for hiding this comment

felixcheung Aug 3, 2017

Choose a reason for hiding this comment

SparkQA commented Aug 3, 2017

SparkQA commented Aug 3, 2017

actuaryzhang commented Aug 4, 2017

felixcheung Aug 4, 2017

Choose a reason for hiding this comment

actuaryzhang Aug 4, 2017

Choose a reason for hiding this comment

felixcheung Aug 5, 2017

Choose a reason for hiding this comment

yanboliang Aug 5, 2017 • edited Loading

Choose a reason for hiding this comment

yanboliang left a comment

Choose a reason for hiding this comment

felixcheung commented Aug 5, 2017 via email

yanboliang commented Aug 5, 2017

actuaryzhang commented Aug 6, 2017 • edited Loading

felixcheung commented Aug 6, 2017

yanboliang Aug 5, 2017 •

edited

Loading

actuaryzhang commented Aug 6, 2017 •

edited

Loading