Skip to content

Commit

Permalink
[SPARK-11507][MLLIB] add compact in Matrices fromBreeze
Browse files Browse the repository at this point in the history
jira: https://issues.apache.org/jira/browse/SPARK-11507
"In certain situations when adding two block matrices, I get an error regarding colPtr and the operation fails. External issue URL includes full error and code for reproducing the problem."

root cause: colPtr.last does NOT always equal to values.length in breeze SCSMatrix, which fails the require in SparseMatrix.

easy step to repro:
```
val m1: BM[Double] = new CSCMatrix[Double] (Array (1.0, 1, 1), 3, 3, Array (0, 1, 2, 3), Array (0, 1, 2) )
val m2: BM[Double] = new CSCMatrix[Double] (Array (1.0, 2, 2, 4), 3, 3, Array (0, 0, 2, 4), Array (1, 2, 1, 2) )
val sum = m1 + m2
Matrices.fromBreeze(sum)
```

Solution: By checking the code in [CSCMatrix](https://github.com/scalanlp/breeze/blob/28000a7b901bc3cfbbbf5c0bce1d0a5dda8281b0/math/src/main/scala/breeze/linalg/CSCMatrix.scala), CSCMatrix in breeze can have extra zeros in the end of data array. Invoking compact will make sure it aligns with the require of SparseMatrix. This should add limited overhead as the actual compact operation is only performed when necessary.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #9520 from hhbyyh/matricesFromBreeze.

(cherry picked from commit ca45861)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

Conflicts:
	mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
  • Loading branch information
hhbyyh authored and jkbradley committed Mar 30, 2016
1 parent 84ad254 commit 3cc3d85
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -879,8 +879,16 @@ object Matrices {
case dm: BDM[Double] =>
new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
case sm: BSM[Double] =>
// Spark-11507. work around breeze issue 479.
val mat = if (sm.colPtrs.last != sm.data.length) {
val matCopy = sm.copy
matCopy.compact()
matCopy
} else {
sm
}
// There is no isTranspose flag for sparse matrices in Breeze
new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
new SparseMatrix(mat.rows, mat.cols, mat.colPtrs, mat.rowIndices, mat.data)
case _ =>
throw new UnsupportedOperationException(
s"Do not support conversion from type ${breeze.getClass.getName}.")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package org.apache.spark.mllib.linalg

import java.util.Random

import breeze.linalg.{CSCMatrix, Matrix => BM}
import org.mockito.Mockito.when
import org.scalatest.mock.MockitoSugar._
import scala.collection.mutable.{Map => MutableMap}
Expand Down Expand Up @@ -498,4 +499,15 @@ class MatricesSuite extends SparkFunSuite {
assert(sm1.numNonzeros === 1)
assert(sm1.numActives === 3)
}

test("fromBreeze with sparse matrix") {
// colPtr.last does NOT always equal to values.length in breeze SCSMatrix and
// invocation of compact() may be necessary. Refer to SPARK-11507
val bm1: BM[Double] = new CSCMatrix[Double](
Array(1.0, 1, 1), 3, 3, Array(0, 1, 2, 3), Array(0, 1, 2))
val bm2: BM[Double] = new CSCMatrix[Double](
Array(1.0, 2, 2, 4), 3, 3, Array(0, 0, 2, 4), Array(1, 2, 1, 2))
val sum = bm1 + bm2
Matrices.fromBreeze(sum)
}
}

0 comments on commit 3cc3d85

Please sign in to comment.