Fix sddmm2 when nnz=0 #300

fmassa · 2022-05-10T16:45:04Z

One of the internal implementations of sampled dense dense matrix multiplication (sddmm) that we have had two issues:

It didn't guard kernel launches when nnz=0 (which would yield a grid size of 0 in one of its dimensions)
it didn't contain a cudaGetLastError call in the end of the functions. So errors in this kernel would only be reported at the next function invocation, misleading the true location of the issue.

The PR fixes this by returning early when nnz=0 (which is fine as there is no data in the tensor anyway), and also add cudaGetLastError which was missing.

Should fix the errors present in #263

Also add cudaGetLastError which was missing

blefaudeux · 2022-05-10T16:51:33Z

tests/test_custom_ops.py

@@ -158,14 +158,14 @@ def test_sddmm_sputnik(device):


 @cuda_only
+@pytest.mark.parametrize("prob", [0.5, 1])


thanks for adding this, nice catch

blefaudeux · 2022-05-10T16:51:54Z

xformers/components/attention/csrc/cuda/sddmm2_cuda.cu

@@ -539,6 +546,7 @@ torch::Tensor sddmm_cuda_csr(
        D2.data_ptr<float>(),
        out.data_ptr<float>());
  }
+  AT_CUDA_CHECK(cudaGetLastError());


ok, makes sense :)

blefaudeux

LGTM, thanks a lot @fmassa for diving in and the very quick fix

blefaudeux · 2022-05-10T16:52:37Z

the mypy error is fixed on main, will be fine on landing

Fix sddmm2 when nnz=0

c289ec6

Also add cudaGetLastError which was missing

fmassa requested a review from blefaudeux May 10, 2022 16:45

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 10, 2022

fmassa mentioned this pull request May 10, 2022

Added SmeLU #263

Merged

10 tasks

blefaudeux reviewed May 10, 2022

View reviewed changes

blefaudeux approved these changes May 10, 2022

View reviewed changes

fmassa merged commit bcedfaf into main May 10, 2022

fmassa deleted the sddmm2-zero-nnz-fix branch May 10, 2022 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sddmm2 when nnz=0 #300

Fix sddmm2 when nnz=0 #300

fmassa commented May 10, 2022

blefaudeux May 10, 2022

blefaudeux May 10, 2022

blefaudeux left a comment

blefaudeux commented May 10, 2022

		@@ -158,14 +158,14 @@ def test_sddmm_sputnik(device):


		@cuda_only
		@pytest.mark.parametrize("prob", [0.5, 1])

Fix sddmm2 when nnz=0 #300

Fix sddmm2 when nnz=0 #300

Conversation

fmassa commented May 10, 2022

blefaudeux May 10, 2022

Choose a reason for hiding this comment

blefaudeux May 10, 2022

Choose a reason for hiding this comment

blefaudeux left a comment

Choose a reason for hiding this comment

blefaudeux commented May 10, 2022