Skip to content

Commit

Permalink
[SPARK-3142][MLLIB] output shuffle data directly in Word2Vec
Browse files Browse the repository at this point in the history
Sorry I didn't realize this in apache#2043. Ishiihara

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#2049 from mengxr/more-w2v and squashes the following commits:

050b1c5 [Xiangrui Meng] output shuffle data directly
  • Loading branch information
mengxr authored and conviva-zz committed Sep 4, 2014
1 parent 09b62c3 commit 8bdba3e
Showing 1 changed file with 12 additions and 11 deletions.
23 changes: 12 additions & 11 deletions mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
Original file line number Diff line number Diff line change
Expand Up @@ -347,19 +347,20 @@ class Word2Vec extends Serializable with Logging {
}
val syn0Local = model._1
val syn1Local = model._2
val synOut = mutable.ListBuffer.empty[(Int, Array[Float])]
var index = 0
while(index < vocabSize) {
if (syn0Modify(index) != 0) {
synOut += ((index, syn0Local.slice(index * vectorSize, (index + 1) * vectorSize)))
// Only output modified vectors.
Iterator.tabulate(vocabSize) { index =>
if (syn0Modify(index) > 0) {
Some((index, syn0Local.slice(index * vectorSize, (index + 1) * vectorSize)))
} else {
None
}
if (syn1Modify(index) != 0) {
synOut += ((index + vocabSize,
syn1Local.slice(index * vectorSize, (index + 1) * vectorSize)))
}.flatten ++ Iterator.tabulate(vocabSize) { index =>
if (syn1Modify(index) > 0) {
Some((index + vocabSize, syn1Local.slice(index * vectorSize, (index + 1) * vectorSize)))
} else {
None
}
index += 1
}
synOut.toIterator
}.flatten
}
val synAgg = partial.reduceByKey { case (v1, v2) =>
blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1)
Expand Down

0 comments on commit 8bdba3e

Please sign in to comment.