-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/StringIndexer/RFormula inherit from HasHandleInvalid #18582
[SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/StringIndexer/RFormula inherit from HasHandleInvalid #18582
Conversation
Test build #79447 has finished for PR 18582 at commit
|
Test build #79451 has finished for PR 18582 at commit
|
816c486
to
71c4250
Compare
Test build #79495 has finished for PR 18582 at commit
|
Test build #79497 has finished for PR 18582 at commit
|
@@ -36,7 +36,8 @@ import org.apache.spark.sql.types.{DoubleType, StructField, StructType} | |||
*/ | |||
@Since("1.4.0") | |||
final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String) | |||
extends Model[Bucketizer] with HasInputCol with HasOutputCol with DefaultParamsWritable { | |||
extends Model[Bucketizer] with HasHandleInvalid with HasInputCol with HasOutputCol | |||
with DefaultParamsWritable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about aligning with
with extends
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it was suggested by IDEA.
@@ -460,16 +460,16 @@ object LinearRegression extends DefaultParamsReadable[LinearRegression] { | |||
val MAX_FEATURES_FOR_NORMAL_SOLVER: Int = WeightedLeastSquares.MAX_NUM_FEATURES | |||
|
|||
/** String name for "auto". */ | |||
private[regression] val AUTO = "auto" | |||
private[regression] val Auto = "auto" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is solver
related with handleInvalid
? It seems a little confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@facaiy It's not related to this PR, just addressed other small issues by the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@facaiy We had some offline discussion, and decided to fix some small issues in this PR.
@@ -3058,26 +3035,37 @@ class RFormula(JavaEstimator, HasFeaturesCol, HasLabelCol, JavaMLReadable, JavaM | |||
"RFormula drops the same category as R when encoding strings.", | |||
typeConverter=TypeConverters.toString) | |||
|
|||
handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle invalid entries. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, handleInvalid
has been declared in HasHandleInvalid
interface, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@facaiy We have to override it to keep the pydoc in line with scala side.
@@ -36,7 +36,8 @@ import org.apache.spark.util.collection.OpenHashMap | |||
/** | |||
* Base trait for [[StringIndexer]] and [[StringIndexerModel]]. | |||
*/ | |||
private[feature] trait StringIndexerBase extends Params with HasInputCol with HasOutputCol { | |||
private[feature] trait StringIndexerBase extends Params with HasHandleInvalid with HasInputCol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is better to break the line at extends
.
@@ -36,7 +36,8 @@ import org.apache.spark.sql.types.{DoubleType, StructField, StructType} | |||
*/ | |||
@Since("1.4.0") | |||
final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String) | |||
extends Model[Bucketizer] with HasInputCol with HasOutputCol with DefaultParamsWritable { | |||
extends Model[Bucketizer] with HasHandleInvalid with HasInputCol with HasOutputCol | |||
with DefaultParamsWritable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it was suggested by IDEA.
@@ -86,15 +87,11 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String | |||
*/ | |||
// TODO: SPARK-18619 Make Bucketizer inherit from HasHandleInvalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line.
@@ -74,16 +74,12 @@ private[feature] trait QuantileDiscretizerBase extends Params | |||
*/ | |||
// TODO: SPARK-18619 Make QuantileDiscretizer inherit from HasHandleInvalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line.
@@ -460,16 +460,16 @@ object LinearRegression extends DefaultParamsReadable[LinearRegression] { | |||
val MAX_FEATURES_FOR_NORMAL_SOLVER: Int = WeightedLeastSquares.MAX_NUM_FEATURES | |||
|
|||
/** String name for "auto". */ | |||
private[regression] val AUTO = "auto" | |||
private[regression] val Auto = "auto" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@facaiy It's not related to this PR, just addressed other small issues by the way.
python/pyspark/ml/feature.py
Outdated
@keyword_only | ||
def __init__(self, formula=None, featuresCol="features", labelCol="label", | ||
forceIndexLabel=False, stringIndexerOrderType="frequencyDesc"): | ||
forceIndexLabel=False, handleInvalid="error", | ||
stringIndexerOrderType="frequencyDesc"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should append new added argument in case to break users' code.
Test build #79555 has finished for PR 18582 at commit
|
LGTM, merged into master. Thanks. |
What changes were proposed in this pull request?
1, HasHandleInvaild support override
2, Make QuantileDiscretizer/Bucketizer/StringIndexer/RFormula inherit from HasHandleInvalid
How was this patch tested?
existing tests
JIRA