-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5988][MLlib] add save/load for PowerIterationClusteringModel #5450
Conversation
Test build #30022 has finished for PR 5450 at commit
|
("class" -> thisClassName) ~ ("version" -> thisFormatVersion) ~ ("k" -> model.k))) | ||
sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path)) | ||
|
||
val dataRDD = model.assignments.map(x => (x.id, x.cluster)).toDF() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assignment.toDF()
should be sufficient and correct. Otherwise, the output column names would be "_1" and "_2".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assignment
is not a case class, so we cannot call toDF()
directly. Shall I change Assignment
into a case class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then call toDF("id", "cluster")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that changing Assignment
into a case class has nothing wrong, so I change it. Otherwise I will write more code to check the schema, for the Loader.checkSchema[]()
is not worked for an ordinary class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's okay. In general, the issue with case classes is that they are hard to extend. For example, changing case class Assignment(id: Long, cluster: Int)
to case class Assignment(id: Long, cluster: Int, confidence: Double)
breaks binary compatibility.
Test build #30156 has finished for PR 5450 at commit
|
LGTM. Merged into master. Thanks! |
See JIRA issue SPARK-5988.