[SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps Missing #19152

marktab · 2017-09-07T02:23:53Z

@dongjoon-hyun @HyukjinKwon

Error in PySpark example code:
/examples/src/main/python/ml/estimator_transformer_param_example.py

The original Scala code says
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)

The parent is lr

There is no method for accessing parent as is done in Scala.

This code has been tested in Python, and returns values consistent with Scala

What changes were proposed in this pull request?

Proposing to call the lr variable instead of model1 or model2

How was this patch tested?

This patch was tested with Spark 2.1.0 comparing the Scala and PySpark results. Pyspark returns nothing at present for those two print lines.

The output for model2 in PySpark should be

{Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the convergence tolerance for iterative algorithms (>= 0).'): 1e-06,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='elasticNetParam', doc='the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.'): 0.0,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', doc='prediction column name.'): 'prediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', doc='features column name.'): 'features',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', doc='label column name.'): 'label',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='probabilityCol', doc='Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.'): 'myProbability',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name.'): 'rawPrediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', doc='The name of family which is a description of the label distribution to be used in the model. Supported options: auto, binomial, multinomial'): 'auto',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', doc='whether to fit an intercept term.'): True,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', doc='Threshold in binary classification prediction, in range [0, 1]. If threshold and thresholds are both set, they must match.e.g. if threshold is p, then thresholds must be equal to [1-p, p].'): 0.55,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', doc='max number of iterations (>= 0).'): 30,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', doc='regularization parameter (>= 0).'): 0.1,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='standardization', doc='whether to standardize the training features before fitting the model.'): True}

Please review http://spark.apache.org/contributing.html before opening a pull request.

@dongjoon-hyun

@dongjoon-hyun @HyukjinKwon Error in PySpark example code: [https://github.com/apache/spark/blob/master/examples/src/main/python/ml/estimator_transformer_param_example.py] The original Scala code says println("Model 2 was fit using parameters: " + model2.parent.extractParamMap) The parent is lr There is no method for accessing parent as is done in Scala. This code has been tested in Python, and returns values consistent with Scala

SparkQA · 2017-09-08T07:01:55Z

Test build #3914 has finished for PR 19152 at commit a2ccb8a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun HyukjinKwon Error in PySpark example code: /examples/src/main/python/ml/estimator_transformer_param_example.py The original Scala code says println("Model 2 was fit using parameters: " + model2.parent.extractParamMap) The parent is lr There is no method for accessing parent as is done in Scala. This code has been tested in Python, and returns values consistent with Scala ## What changes were proposed in this pull request? Proposing to call the lr variable instead of model1 or model2 ## How was this patch tested? This patch was tested with Spark 2.1.0 comparing the Scala and PySpark results. Pyspark returns nothing at present for those two print lines. The output for model2 in PySpark should be {Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the convergence tolerance for iterative algorithms (>= 0).'): 1e-06, Param(parent='LogisticRegression_4187be538f744d5a9090', name='elasticNetParam', doc='the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.'): 0.0, Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', doc='prediction column name.'): 'prediction', Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', doc='features column name.'): 'features', Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', doc='label column name.'): 'label', Param(parent='LogisticRegression_4187be538f744d5a9090', name='probabilityCol', doc='Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.'): 'myProbability', Param(parent='LogisticRegression_4187be538f744d5a9090', name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name.'): 'rawPrediction', Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', doc='The name of family which is a description of the label distribution to be used in the model. Supported options: auto, binomial, multinomial'): 'auto', Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', doc='whether to fit an intercept term.'): True, Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', doc='Threshold in binary classification prediction, in range [0, 1]. If threshold and thresholds are both set, they must match.e.g. if threshold is p, then thresholds must be equal to [1-p, p].'): 0.55, Param(parent='LogisticRegression_4187be538f744d5a9090', name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2, Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', doc='max number of iterations (>= 0).'): 30, Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', doc='regularization parameter (>= 0).'): 0.1, Param(parent='LogisticRegression_4187be538f744d5a9090', name='standardization', doc='whether to standardize the training features before fitting the model.'): True} Please review http://spark.apache.org/contributing.html before opening a pull request. Author: MarkTab marktab.net <marktab@users.noreply.github.com> Closes #19152 from marktab/branch-2.2.

srowen · 2017-09-08T07:08:35Z

Merged to 2.2

marktab · 2017-09-14T02:59:49Z

@srowen -- may I close this pull request?

WeichenXu123 · 2017-09-18T10:14:37Z

@marktab You should close merged PR. Thanks!

dongjoon-hyun HyukjinKwon Error in PySpark example code: /examples/src/main/python/ml/estimator_transformer_param_example.py The original Scala code says println("Model 2 was fit using parameters: " + model2.parent.extractParamMap) The parent is lr There is no method for accessing parent as is done in Scala. This code has been tested in Python, and returns values consistent with Scala ## What changes were proposed in this pull request? Proposing to call the lr variable instead of model1 or model2 ## How was this patch tested? This patch was tested with Spark 2.1.0 comparing the Scala and PySpark results. Pyspark returns nothing at present for those two print lines. The output for model2 in PySpark should be {Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the convergence tolerance for iterative algorithms (>= 0).'): 1e-06, Param(parent='LogisticRegression_4187be538f744d5a9090', name='elasticNetParam', doc='the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.'): 0.0, Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', doc='prediction column name.'): 'prediction', Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', doc='features column name.'): 'features', Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', doc='label column name.'): 'label', Param(parent='LogisticRegression_4187be538f744d5a9090', name='probabilityCol', doc='Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.'): 'myProbability', Param(parent='LogisticRegression_4187be538f744d5a9090', name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name.'): 'rawPrediction', Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', doc='The name of family which is a description of the label distribution to be used in the model. Supported options: auto, binomial, multinomial'): 'auto', Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', doc='whether to fit an intercept term.'): True, Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', doc='Threshold in binary classification prediction, in range [0, 1]. If threshold and thresholds are both set, they must match.e.g. if threshold is p, then thresholds must be equal to [1-p, p].'): 0.55, Param(parent='LogisticRegression_4187be538f744d5a9090', name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2, Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', doc='max number of iterations (>= 0).'): 30, Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', doc='regularization parameter (>= 0).'): 0.1, Param(parent='LogisticRegression_4187be538f744d5a9090', name='standardization', doc='whether to standardize the training features before fitting the model.'): True} Please review http://spark.apache.org/contributing.html before opening a pull request. Author: MarkTab marktab.net <marktab@users.noreply.github.com> Closes apache#19152 from marktab/branch-2.2.

srowen approved these changes Sep 7, 2017

View reviewed changes

marktab closed this Sep 14, 2017

marktab reopened this Sep 14, 2017

HyukjinKwon mentioned this pull request Sep 26, 2017

[BUILD] Close stale PRs #19348

Closed

asfgit closed this in ceaec93 Sep 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps Missing #19152

[SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps Missing #19152

marktab commented Sep 7, 2017

SparkQA commented Sep 8, 2017

srowen commented Sep 8, 2017

marktab commented Sep 14, 2017

WeichenXu123 commented Sep 18, 2017

[SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps Missing #19152

[SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps Missing #19152

Conversation

marktab commented Sep 7, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Sep 8, 2017

srowen commented Sep 8, 2017

marktab commented Sep 14, 2017

WeichenXu123 commented Sep 18, 2017