-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30398][ML] PCA/RegressionMetrics/RowMatrix avoid unnecessary computation #27059
Conversation
Test build #115988 has finished for PR 27059 at commit
|
@@ -21,6 +21,7 @@ import scala.annotation.varargs | |||
|
|||
import org.apache.spark.annotation.Since | |||
import org.apache.spark.api.java.{JavaDoubleRDD, JavaRDD} | |||
import org.apache.spark.ml.stat.SummaryBuilderImpl._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design wise, can we minimize the places where mllib calls ml? and is it possible to expose this without reaching into the "Impl" class? it looks a little funny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because current aggregator is an inner class org.apache.spark.ml.stat.SummaryBuilderImpl.SummarizerBuffer
,
I guess I need to move it outside of SummaryBuilderImpl
minimiaze import of ml
e34c4fb
to
a675881
Compare
Test build #116005 has finished for PR 27059 at commit
|
Test build #116011 has finished for PR 27059 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the change here is breaking out the inner class right?
Merged to master |
What changes were proposed in this pull request?
use
.ml.Summarizer
instead of.mllib.MultivariateOnlineSummarizer
to avoid computation of unused metricsWhy are the changes needed?
to avoid computation of unused metrics
Does this PR introduce any user-facing change?
No
How was this patch tested?
existing testsuites