Params from parent java estimators aren't copied to python mmlspark models #582

kschelonka · 2019-06-07T16:19:40Z

The java params for mmlspark estimators like LightGBMClassifier, etc. aren't copied over to the python instances.

This is related to this Jira ticket: PySpark ML Models should contain Param values

A temporary fix was added so that the params can be accessed using getOrDefault method. This does make it possible to pull in the params from mmlspark models, like LightGBMClassifier:

Spark developers are planning on incrementally updating the pyspark API to use the appropriate getter and setter methods, and having the pyspark models define the params within themselves (see SPARK-21812). For example, CountVectorizer was updated in this fashion.

Since it's very useful to be able to access model parameters, I propose updating mmlspark models in a similar fashion. Happy to contribute to this effort.

The text was updated successfully, but these errors were encountered:

imatiach-msft · 2019-06-07T20:03:49Z

@kschelonka thanks for bringing up this issue!
"having the pyspark models define the params within themselves"
this sounds reasonable, although I think we should only allow the parameters to be settable on the model if they can actually change the output of the model (I recall seeing cases in Spark ML where this was not the case which seemed strange to me).
"I propose updating mmlspark models in a similar fashion. Happy to contribute to this effort."
that would be great, if you have the cycles to do so. Note that all of our python wrappers are autogenerated - so this should hopefully require less coding and work on all python models at once.
Also, I should mention that @mhamilton723 is doing a large refactor of the codebase right now. Currently, we have a special ./runme script that sets up mmlspark on linux, but it relies on bash and may do too much on behalf of the developer. @mhamilton723 is refactoring the codebase to rely only on sbt which should make it possible to develop on windows and mac os. Setting up the code for development is a bit difficult right now. He might suggest that you use the refactor branch, I'm not sure how far along he is in the refactor.

kschelonka · 2019-06-07T21:34:00Z

Thanks for the response!
"we should only allow the parameters to be settable on the model if they can actually change the output of the model "
That makes sense to me as well, but I would suggest that the settable params should include anything that has no effect on model fit. So for example featuresCol should be able to be settable, even though it doesn't change the model's output.

The build refactor sounds like it would make development easier, so I'll wait for updates on that. Thanks!

mhamilton723 · 2019-08-26T14:50:08Z

@kschelonka, @imatiach-msft I think this would be mitigated by making the LightGBM models proper sparkML estimators with getter and setters instead of being constuctor defined. This would also improve code re-use. Ilya, could you consider this when doing yout LGBM cleanup PR?

imatiach-msft · 2020-06-02T03:36:02Z

FYI this is now fixed on LightGBM (because of ComplexParamsWritable) but not on some of the other estimators like TrainClassifier.

mhamilton723 added the area/lightgbm label Aug 26, 2019

mhamilton723 assigned mhamilton723 and imatiach-msft and unassigned mhamilton723 Aug 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Params from parent java estimators aren't copied to python mmlspark models #582

Params from parent java estimators aren't copied to python mmlspark models #582

kschelonka commented Jun 7, 2019 •

edited

Loading

imatiach-msft commented Jun 7, 2019

kschelonka commented Jun 7, 2019 •

edited

Loading

mhamilton723 commented Aug 26, 2019

imatiach-msft commented Jun 2, 2020

Params from parent java estimators aren't copied to python mmlspark models #582

Params from parent java estimators aren't copied to python mmlspark models #582

Comments

kschelonka commented Jun 7, 2019 • edited Loading

imatiach-msft commented Jun 7, 2019

kschelonka commented Jun 7, 2019 • edited Loading

mhamilton723 commented Aug 26, 2019

imatiach-msft commented Jun 2, 2020

kschelonka commented Jun 7, 2019 •

edited

Loading

kschelonka commented Jun 7, 2019 •

edited

Loading