Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11521] [ML] [DOC] Document that Logistic, Linear Regression summaries ignore weight col #9927

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -755,23 +755,35 @@ class BinaryLogisticRegressionSummary private[classification] (
* Returns the receiver operating characteristic (ROC) curve,
* which is an Dataframe having two fields (FPR, TPR)
* with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LogisticRegression.weightCol]].
* This will change in later Spark versions.
* @see http://en.wikipedia.org/wiki/Receiver_operating_characteristic
*/
@transient lazy val roc: DataFrame = binaryMetrics.roc().toDF("FPR", "TPR")

/**
* Computes the area under the receiver operating characteristic (ROC) curve.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LogisticRegression.weightCol]].
* This will change in later Spark versions.
*/
lazy val areaUnderROC: Double = binaryMetrics.areaUnderROC()

/**
* Returns the precision-recall curve, which is an Dataframe containing
* two fields recall, precision with (0.0, 1.0) prepended to it.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LogisticRegression.weightCol]].
* This will change in later Spark versions.
*/
@transient lazy val pr: DataFrame = binaryMetrics.pr().toDF("recall", "precision")

/**
* Returns a dataframe with two fields (threshold, F-Measure) curve with beta = 1.0.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LogisticRegression.weightCol]].
* This will change in later Spark versions.
*/
@transient lazy val fMeasureByThreshold: DataFrame = {
binaryMetrics.fMeasureByThreshold().toDF("threshold", "F-Measure")
Expand All @@ -781,6 +793,9 @@ class BinaryLogisticRegressionSummary private[classification] (
* Returns a dataframe with two fields (threshold, precision) curve.
* Every possible probability obtained in transforming the dataset are used
* as thresholds used in calculating the precision.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LogisticRegression.weightCol]].
* This will change in later Spark versions.
*/
@transient lazy val precisionByThreshold: DataFrame = {
binaryMetrics.precisionByThreshold().toDF("threshold", "precision")
Expand All @@ -790,6 +805,9 @@ class BinaryLogisticRegressionSummary private[classification] (
* Returns a dataframe with two fields (threshold, recall) curve.
* Every possible probability obtained in transforming the dataset are used
* as thresholds used in calculating the recall.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LogisticRegression.weightCol]].
* This will change in later Spark versions.
*/
@transient lazy val recallByThreshold: DataFrame = {
binaryMetrics.recallByThreshold().toDF("threshold", "recall")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -540,34 +540,49 @@ class LinearRegressionSummary private[regression] (
* Returns the explained variance regression score.
* explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
* Reference: [[http://en.wikipedia.org/wiki/Explained_variation]]
*
* Note: This ignores instance weights (setting all to 1.0) from [[LinearRegression.weightCol]].
* This will change in later Spark versions.
*/
@Since("1.5.0")
val explainedVariance: Double = metrics.explainedVariance

/**
* Returns the mean absolute error, which is a risk function corresponding to the
* expected value of the absolute error loss or l1-norm loss.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LinearRegression.weightCol]].
* This will change in later Spark versions.
*/
@Since("1.5.0")
val meanAbsoluteError: Double = metrics.meanAbsoluteError

/**
* Returns the mean squared error, which is a risk function corresponding to the
* expected value of the squared error loss or quadratic loss.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LinearRegression.weightCol]].
* This will change in later Spark versions.
*/
@Since("1.5.0")
val meanSquaredError: Double = metrics.meanSquaredError

/**
* Returns the root mean squared error, which is defined as the square root of
* the mean squared error.
*
* Note: This ignores instance weights (setting all to 1.0) from [[LinearRegression.weightCol]].
* This will change in later Spark versions.
*/
@Since("1.5.0")
val rootMeanSquaredError: Double = metrics.rootMeanSquaredError

/**
* Returns R^2^, the coefficient of determination.
* Reference: [[http://en.wikipedia.org/wiki/Coefficient_of_determination]]
*
* Note: This ignores instance weights (setting all to 1.0) from [[LinearRegression.weightCol]].
* This will change in later Spark versions.
*/
@Since("1.5.0")
val r2: Double = metrics.r2
Expand Down