-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR][SPARKR][ML] Joint coefficients with intercept for SparkR linear SVM summary. #18035
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung I'd propose to rename spark.svmLinear
to spark.svm
, since svm
is widely used for R users by e1071
package and we may support non linear model in the future (although with low probability), we can reuse this SparkR API. It would be like spark.gbt
which can call two ML algorithms with the single SparkR API. What do you think of it?
@@ -38,9 +38,17 @@ private[r] class LinearSVCWrapper private ( | |||
private val svcModel: LinearSVCModel = | |||
pipeline.stages(1).asInstanceOf[LinearSVCModel] | |||
|
|||
lazy val coefficients: Array[Double] = svcModel.coefficients.toArray | |||
lazy val rFeatures: Array[String] = if (svcModel.getFitIntercept) { | |||
Array("(Intercept)") ++ features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In R we stack intercept
with other feature names, you can refer spark.glm
, spark.logit
, spark.survreg
.
numClasses <- callJMethod(jobj, "numClasses") | ||
numFeatures <- callJMethod(jobj, "numFeatures") | ||
if (nCol == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ML LinearSVC
only supports binary classification, and will not support multiple classification in the near future, so we can simplify here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not label, intercept? i think they are common in R to include what goes into the model (although in many cases it just include the formula in the model summary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung The change here is to make coefficients
matrix has only one column named Estimate
. I speculate the original code referred to spark.logit
which supports multiple classification, so it should have multiple columns and each columns' name should be corresponding label. For binary classification, the coefficients are not bind to any labels, so we use Estimate
as the column name like what R does. LinearSVC
will not support multiple classification in the future, so I simplified it at here.
The followings are summary
outputs for binomial and multinomial logistic regression in SparkR:
Binomial logistic regression model:
Multinomial logistic regression model:
Test build #77094 has finished for PR 18035 at commit
|
Test build #77097 has finished for PR 18035 at commit
|
are you targeting these changes for 2.2.0 - since we are making API/return results changes here |
I see your point but Also from various threads it seems really really unlikely that we will implement non-linear form of svm like you said :) |
R/pkg/R/mllib_classification.R
Outdated
@@ -111,10 +112,10 @@ setMethod("spark.svmLinear", signature(data = "SparkDataFrame", formula = "formu | |||
new("LinearSVCModel", jobj = jobj) | |||
}) | |||
|
|||
# Predicted values based on an LinearSVCModel model | |||
# Predicted values based on a linear SVM model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are intentional - we have # Predicted values based on an LogisticRegressionModel model
they are prefix by #
and not in generated doc - only for developers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a couple of these starting with #
numClasses <- callJMethod(jobj, "numClasses") | ||
numFeatures <- callJMethod(jobj, "numFeatures") | ||
if (nCol == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not label, intercept? i think they are common in R to include what goes into the model (although in many cases it just include the formula in the model summary)
@felixcheung Thanks for your comments. I'm targeting this for 2.2, in case for breaking change. With respect to the name issue, I'm still more prefer to rename to |
Test build #77179 has finished for PR 18035 at commit
|
Test build #77181 has finished for PR 18035 at commit
|
@yanboliang Appreciate discussing this matter with me, and it is important to sort this out now. Normally I wouldn't mind either way; but in this case I kinda feel strongly about not making this name change for 2 main reasons:
Anyway, what do you think? |
R/pkg/R/mllib_classification.R
Outdated
#' | ||
#' Fits an linear SVM model against a SparkDataFrame. It is a binary classifier, similar to svm in glmnet package | ||
#' Fits a linear SVM model against a SparkDataFrame, similar to svm in e1071 package. | ||
#' Currently only supports binary classification model with linear kernal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean kernel
instead of kernal
?
@felixcheung For the name issue, I'm OK to keep as it is, thanks for your clarification. What about other changes in this PR? |
Test build #77216 has finished for PR 18035 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! LGTM
let's ignore the appveyor intermitted error - since it passed before simple typo changes |
Merged into master and branch-2.2. Thanks for reviewing. |
…ar SVM summary. ## What changes were proposed in this pull request? Joint coefficients with intercept for SparkR linear SVM summary. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #18035 from yanboliang/svm-r. (cherry picked from commit ad09e4c) Signed-off-by: Yanbo Liang <ybliang8@gmail.com>
…ar SVM summary. ## What changes were proposed in this pull request? Joint coefficients with intercept for SparkR linear SVM summary. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#18035 from yanboliang/svm-r.
What changes were proposed in this pull request?
Joint coefficients with intercept for SparkR linear SVM summary.
How was this patch tested?
Existing tests.