Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/StringIndexer/RFormula inherit from HasHandleInvalid #18582

Closed

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jul 10, 2017

What changes were proposed in this pull request?

1, HasHandleInvaild support override
2, Make QuantileDiscretizer/Bucketizer/StringIndexer/RFormula inherit from HasHandleInvalid

How was this patch tested?

existing tests

JIRA

@SparkQA
Copy link

SparkQA commented Jul 10, 2017

Test build #79447 has finished for PR 18582 at commit d336f14.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2017

Test build #79451 has finished for PR 18582 at commit 816c486.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng zhengruifeng force-pushed the heritate_HasHandleInvalid branch from 816c486 to 71c4250 Compare July 11, 2017 03:05
@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79495 has finished for PR 18582 at commit 71c4250.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasHandleInvalid,
  • class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid,
  • class RFormula(JavaEstimator, HasFeaturesCol, HasLabelCol, HasHandleInvalid,

@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79497 has finished for PR 18582 at commit bd467b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -36,7 +36,8 @@ import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
*/
@Since("1.4.0")
final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String)
extends Model[Bucketizer] with HasInputCol with HasOutputCol with DefaultParamsWritable {
extends Model[Bucketizer] with HasHandleInvalid with HasInputCol with HasOutputCol
with DefaultParamsWritable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about aligning with with extends?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it was suggested by IDEA.

@@ -460,16 +460,16 @@ object LinearRegression extends DefaultParamsReadable[LinearRegression] {
val MAX_FEATURES_FOR_NORMAL_SOLVER: Int = WeightedLeastSquares.MAX_NUM_FEATURES

/** String name for "auto". */
private[regression] val AUTO = "auto"
private[regression] val Auto = "auto"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is solver related with handleInvalid? It seems a little confused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@facaiy It's not related to this PR, just addressed other small issues by the way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@facaiy We had some offline discussion, and decided to fix some small issues in this PR.

@@ -3058,26 +3035,37 @@ class RFormula(JavaEstimator, HasFeaturesCol, HasLabelCol, JavaMLReadable, JavaM
"RFormula drops the same category as R when encoding strings.",
typeConverter=TypeConverters.toString)

handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle invalid entries. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, handleInvalid has been declared in HasHandleInvalid interface, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@facaiy We have to override it to keep the pydoc in line with scala side.

@@ -36,7 +36,8 @@ import org.apache.spark.util.collection.OpenHashMap
/**
* Base trait for [[StringIndexer]] and [[StringIndexerModel]].
*/
private[feature] trait StringIndexerBase extends Params with HasInputCol with HasOutputCol {
private[feature] trait StringIndexerBase extends Params with HasHandleInvalid with HasInputCol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is better to break the line at extends.

@@ -36,7 +36,8 @@ import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
*/
@Since("1.4.0")
final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String)
extends Model[Bucketizer] with HasInputCol with HasOutputCol with DefaultParamsWritable {
extends Model[Bucketizer] with HasHandleInvalid with HasInputCol with HasOutputCol
with DefaultParamsWritable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it was suggested by IDEA.

@@ -86,15 +87,11 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String
*/
// TODO: SPARK-18619 Make Bucketizer inherit from HasHandleInvalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line.

@@ -74,16 +74,12 @@ private[feature] trait QuantileDiscretizerBase extends Params
*/
// TODO: SPARK-18619 Make QuantileDiscretizer inherit from HasHandleInvalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line.

@@ -460,16 +460,16 @@ object LinearRegression extends DefaultParamsReadable[LinearRegression] {
val MAX_FEATURES_FOR_NORMAL_SOLVER: Int = WeightedLeastSquares.MAX_NUM_FEATURES

/** String name for "auto". */
private[regression] val AUTO = "auto"
private[regression] val Auto = "auto"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@facaiy It's not related to this PR, just addressed other small issues by the way.

@keyword_only
def __init__(self, formula=None, featuresCol="features", labelCol="label",
forceIndexLabel=False, stringIndexerOrderType="frequencyDesc"):
forceIndexLabel=False, handleInvalid="error",
stringIndexerOrderType="frequencyDesc"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should append new added argument in case to break users' code.

@SparkQA
Copy link

SparkQA commented Jul 12, 2017

Test build #79555 has finished for PR 18582 at commit 61505f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

LGTM, merged into master. Thanks.

@asfgit asfgit closed this in d2d2a5d Jul 12, 2017
@zhengruifeng zhengruifeng deleted the heritate_HasHandleInvalid branch August 21, 2019 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants