Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs, docs #16009

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,6 @@ class LogisticRegression @Since("1.2.0") (

private var optInitialModel: Option[LogisticRegressionModel] = None

/** @group setParam */
private[spark] def setInitialModel(model: LogisticRegressionModel): this.type = {
this.optInitialModel = Some(model)
this
Expand All @@ -318,8 +317,9 @@ class LogisticRegression @Since("1.2.0") (
train(dataset, handlePersistence)
}

protected[spark] def train(dataset: Dataset[_], handlePersistence: Boolean):
LogisticRegressionModel = {
protected[spark] def train(
dataset: Dataset[_],
handlePersistence: Boolean): LogisticRegressionModel = {
val w = if (!isDefined(weightCol) || $(weightCol).isEmpty) lit(1.0) else col($(weightCol))
val instances: RDD[Instance] =
dataset.select(col($(labelCol)), w, col($(featuresCol))).rdd.map {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import org.apache.spark.sql.types.DoubleType
/**
* Params for Naive Bayes Classifiers.
*/
private[ml] trait NaiveBayesParams extends PredictorParams with HasWeightCol {
private[classification] trait NaiveBayesParams extends PredictorParams with HasWeightCol {

/**
* The smoothing parameter.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,13 +82,14 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String
* invalid values), error (throw an error), or keep (keep invalid values in a special additional
* bucket).
* Default: "error"
* TODO: Reuse handleInvalid in HasHandleInvalid.
* @group param
*/
@Since("2.1.0")
val handleInvalid: Param[String] = new Param[String](this, "handleInvalid", "how to handle" +
"invalid entries. Options are skip (filter out rows with invalid values), " +
"error (throw an error), or keep (keep invalid values in a special additional bucket).",
ParamValidators.inArray(Bucketizer.supportedHandleInvalid))
ParamValidators.inArray(Bucketizer.supportedHandleInvalids))

/** @group getParam */
@Since("2.1.0")
Expand Down Expand Up @@ -145,7 +146,7 @@ object Bucketizer extends DefaultParamsReadable[Bucketizer] {
private[feature] val SKIP_INVALID: String = "skip"
private[feature] val ERROR_INVALID: String = "error"
private[feature] val KEEP_INVALID: String = "keep"
private[feature] val supportedHandleInvalid: Array[String] =
private[feature] val supportedHandleInvalids: Array[String] =
Array(SKIP_INVALID, ERROR_INVALID, KEEP_INVALID)

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,13 @@ private[feature] trait ChiSqSelectorParams extends Params
*
* @group param
*/
@Since("1.6.0")
final val numTopFeatures = new IntParam(this, "numTopFeatures",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the@since removed, btw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we don't add since tag to variables and functions in traits, since they may be inherited by new child classes later on and the tag is incorrect for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens in several other places though, are we going to remove them all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, theoretically we should do that, but I'm not very confidence whether this change is appropriate. If we meet an agreement on how to deal with this issue, we can address other places in this PR or follow-up work. cc @jkbradley

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think it's OK to have Since annotations in the trait since it is private and should never be used beyond ChiSqSelector and ChiSqSelectorModel. That seems pretty safe to me.

Copy link
Contributor Author

@yanboliang yanboliang Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's safe for this case. However, I found lots of other traits which is also safe enough to add since tag but did not add. I reverted this part of change in this PR to make it catch another RC of 2.1, and I think we should unify them in a separate work. Thanks.

"Number of features that selector will select, ordered by ascending p-value. If the" +
" number of features is < numTopFeatures, then this will select all features.",
ParamValidators.gtEq(1))
setDefault(numTopFeatures -> 50)

/** @group getParam */
@Since("1.6.0")
def getNumTopFeatures: Int = $(numTopFeatures)

/**
Expand All @@ -66,14 +64,12 @@ private[feature] trait ChiSqSelectorParams extends Params
* Default value is 0.1.
* @group param
*/
@Since("2.1.0")
final val percentile = new DoubleParam(this, "percentile",
"Percentile of features that selector will select, ordered by ascending p-value.",
ParamValidators.inRange(0, 1))
setDefault(percentile -> 0.1)

/** @group getParam */
@Since("2.1.0")
def getPercentile: Double = $(percentile)

/**
Expand All @@ -94,15 +90,13 @@ private[feature] trait ChiSqSelectorParams extends Params
* Supported options: "numTopFeatures" (default), "percentile", "fpr".
* @group param
*/
@Since("2.1.0")
final val selectorType = new Param[String](this, "selectorType",
"The selector type of the ChisqSelector. " +
"Supported options: " + OldChiSqSelector.supportedSelectorTypes.mkString(", "),
ParamValidators.inArray[String](OldChiSqSelector.supportedSelectorTypes))
setDefault(selectorType -> OldChiSqSelector.NumTopFeatures)

/** @group getParam */
@Since("2.1.0")
def getSelectorType: String = $(selectorType)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,17 +70,16 @@ private[feature] trait QuantileDiscretizerBase extends Params
* invalid values), error (throw an error), or keep (keep invalid values in a special additional
* bucket).
* Default: "error"
* TODO: Reuse handleInvalid in HasHandleInvalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a note to add "HasHandleInvalid" as a shared param? I was a bit confused by it at first. Maybe we can reference a jira number for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, updated.

* @group param
*/
@Since("2.1.0")
val handleInvalid: Param[String] = new Param[String](this, "handleInvalid", "how to handle" +
"invalid entries. Options are skip (filter out rows with invalid values), " +
"error (throw an error), or keep (keep invalid values in a special additional bucket).",
ParamValidators.inArray(Bucketizer.supportedHandleInvalid))
ParamValidators.inArray(Bucketizer.supportedHandleInvalids))
setDefault(handleInvalid, Bucketizer.ERROR_INVALID)

/** @group getParam */
@Since("2.1.0")
def getHandleInvalid: String = $(handleInvalid)

}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,15 @@ import org.apache.spark.mllib.linalg.CholeskyDecomposition
* @param objectiveHistory Option containing the objective history when an optimization program is
* used to solve the normal equations. None when an analytic solver is used.
*/
private[ml] class NormalEquationSolution(
private[optim] class NormalEquationSolution(
val coefficients: Array[Double],
val aaInv: Option[Array[Double]],
val objectiveHistory: Option[Array[Double]])

/**
* Interface for classes that solve the normal equations locally.
*/
private[ml] sealed trait NormalEquationSolver {
private[optim] sealed trait NormalEquationSolver {

/** Solve the normal equations from summary statistics. */
def solve(
Expand All @@ -56,7 +56,7 @@ private[ml] sealed trait NormalEquationSolver {
/**
* A class that solves the normal equations directly, using Cholesky decomposition.
*/
private[ml] class CholeskySolver extends NormalEquationSolver {
private[optim] class CholeskySolver extends NormalEquationSolver {

override def solve(
bBar: Double,
Expand All @@ -75,7 +75,7 @@ private[ml] class CholeskySolver extends NormalEquationSolver {
/**
* A class for solving the normal equations using Quasi-Newton optimization methods.
*/
private[ml] class QuasiNewtonSolver(
private[optim] class QuasiNewtonSolver(
fitIntercept: Boolean,
maxIter: Int,
tol: Double,
Expand Down