You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am trying to generate a CUDA code for DenseMatrix.inv statement in OptiML and I realized that in the DenseMatrixOps the DenseMatrixInverse case class inherits/extends a DeliteOpSingleWithManifest, which according to my experience will allow emitting only a sequential/Scala code as it is not a parallel op. I maybe wrong about this as I am still getting to understand some of these things.
If I am right about this, do you think it's possible to try implementing DenseMatrixInverse using DeliteOpIndexedLoop or DeliteOpForEach with the concept of Guass-Jordan elimination algorithm. I have tried this with pure CUDA and it seems to work fine although I have not compared any speed-ups with the sequential pure C version. My plan was to try it first but my advisors suggested I find out first before any attempt. Please advice.
Here is the code I tried on OptiML and my new DSL (OptiSDR), which currently adotpts/inherits most functionality from OptiLA.
val m1 = DenseMatrix.rand(10000,4250)
val invm1 = m1.inv
The text was updated successfully, but these errors were encountered:
You're right that the current implementation of the matrix inverse is sequential, and therefore CUDA kernel will not be generated.
I think it's worth trying to implement using parallel ops as you mentioned if it's not too complicated.
One thing to note is that there are existing CUDA libraries you can use to calculate the matrix inverse, and I'm not sure if using the Delite parallel ops would perform poorly compared to those implementations.
Hi,
I am trying to generate a CUDA code for DenseMatrix.inv statement in OptiML and I realized that in the DenseMatrixOps the DenseMatrixInverse case class inherits/extends a DeliteOpSingleWithManifest, which according to my experience will allow emitting only a sequential/Scala code as it is not a parallel op. I maybe wrong about this as I am still getting to understand some of these things.
If I am right about this, do you think it's possible to try implementing DenseMatrixInverse using DeliteOpIndexedLoop or DeliteOpForEach with the concept of Guass-Jordan elimination algorithm. I have tried this with pure CUDA and it seems to work fine although I have not compared any speed-ups with the sequential pure C version. My plan was to try it first but my advisors suggested I find out first before any attempt. Please advice.
Here is the code I tried on OptiML and my new DSL (OptiSDR), which currently adotpts/inherits most functionality from OptiLA.
The text was updated successfully, but these errors were encountered: