-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support groups in DenseConvDims #289
Conversation
Our cpu conv kernels don't seem to support groups, do they? |
@DhairyaLGandhi can I get any answers here? Which conv implementation supports groups? Can you point to some specific line of code? |
ah you are implementing groups support right now, don't have to be that mysterious :) |
So 115784b is a "working" version which is very memory wasteful, but I have it here right now in case someone was eager to see it working in NNlib. I will be moving to the non-allocating version of this soon. The issue is Julia doesn't like to look into the |
For the record, not using tasks is as fast as the original version for the regular convolutions (ie with |
Oh this is satisfying. Starting Julia with julia> @btime y = conv($(rand(Float32, 28, 28, 100, 2)), $(rand(Float32, 3, 3, 20, 15)), groups = 5);
947.722 μs (159 allocations: 4.81 MiB) Either way we now have the system in place to pass it to CUDNN. Should I also add a stub to |
I don't see any safety issue in going multithreaded here, computations are independent. We should check that using threads doesn't have a performance impact on the standard convolution case (G=1). |
Its a question of computational complexity. The existing cases aren't affected much at all. |
I think the existing cases are affected because spawning threads doesn't come for free, but hopefully could be negligible compared to the convolution cost. If that's not the case, we can still dispatch on the G=1 case. In any case, we should check we don't have performance regressions in standard convolution before merging |
Way ahead of you :) |
🎉 |
Addressed all the comments. I've also added more tests and docs. @pxl-th could you rebase your NNlibCUDA PR? Any other comments? |
not sure this respecting the fact that |
Done, also made |
Ah, almost forgot - we need |
I've addressed the reviews - thanks @staticfloat! I had a couple of debugging steps that I've cleaned up now. Also comes with more tests, and docs! Re the added check - its there so we can give a clearer message about groups config. Re threading - it allows using the newer scheduler which reasons about in terms of tasks, so its pretty smart about how to handle the resources. It is composable too https://julialang.org/blog/2019/07/multithreading/ |
@pxl-th do you think you can test your NNlibCUDA PR off of this one? The API is mostly the same, but good to get a check that we are passing into CUDNN properly. |
Let's merge this. I'll wait till tomorrow, so if anyone has final thoughts we can address those. |
@DhairyaLGandhi I've tested it. Everything seems to work, all the tests are passing. |
Nice, let's get these PRs merged then. |
Also propagate it to the kernel calls.