Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix Inverse CUDA generation - no kernels loaded #39

Open
leratojeffrey opened this issue Jun 11, 2014 · 2 comments
Open

Matrix Inverse CUDA generation - no kernels loaded #39

leratojeffrey opened this issue Jun 11, 2014 · 2 comments

Comments

@leratojeffrey
Copy link

Hi,
I am trying to generate a CUDA code for DenseMatrix.inv statement in OptiML and I realized that in the DenseMatrixOps the DenseMatrixInverse case class inherits/extends a DeliteOpSingleWithManifest, which according to my experience will allow emitting only a sequential/Scala code as it is not a parallel op. I maybe wrong about this as I am still getting to understand some of these things.

If I am right about this, do you think it's possible to try implementing DenseMatrixInverse using DeliteOpIndexedLoop or DeliteOpForEach with the concept of Guass-Jordan elimination algorithm. I have tried this with pure CUDA and it seems to work fine although I have not compared any speed-ups with the sequential pure C version. My plan was to try it first but my advisors suggested I find out first before any attempt. Please advice.

Here is the code I tried on OptiML and my new DSL (OptiSDR), which currently adotpts/inherits most functionality from OptiLA.

        val m1 = DenseMatrix.rand(10000,4250)
        val invm1 = m1.inv
@hyouklee
Copy link
Member

Hi Lerato,

You're right that the current implementation of the matrix inverse is sequential, and therefore CUDA kernel will not be generated.
I think it's worth trying to implement using parallel ops as you mentioned if it's not too complicated.
One thing to note is that there are existing CUDA libraries you can use to calculate the matrix inverse, and I'm not sure if using the Delite parallel ops would perform poorly compared to those implementations.

@leratojeffrey
Copy link
Author

Thanks Lee, I will try it using Delite parallel ops and let you Guys know soon what I came up with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants