Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values #18179

Closed
wants to merge 6 commits into from

Conversation

yf225
Copy link
Contributor

@yf225 yf225 commented Mar 19, 2019

Currently, this code gives incorrect result:

import torch
indices=torch.tensor([[7, 1, 3]])
values=torch.tensor([[1., 1., 1.],
               [1., 1., 1.],
               [1., 1., 1.]])
x = torch.sparse_coo_tensor(indices, values, size=(10, 3))
values=torch.tensor(1.).expand(3, 3)
y = torch.sparse_coo_tensor(indices, values, size=(10, 3))
z = x + y

# Should have been all 2's in `values`
tensor(indices=tensor([[7, 1, 3]]),
       values=tensor([[2., 1., 1.],
                      [1., 1., 1.],
                      [1., 1., 1.]]),
       size=(10, 3), nnz=3, layout=torch.sparse_coo)

This PR fixes the bug by adding special handling for sparse tensors with non-contiguous values in the addition function (specifically, by cat'ing the indices and values together).

This PR closes #17950 and #17919.

@yf225 yf225 requested review from gchanan and zou3519 March 19, 2019 16:02
@zou3519
Copy link
Contributor

zou3519 commented Mar 19, 2019

Mar 19 16:36:10 ======================================================================
Mar 19 16:36:10 ERROR: test_sparse_ctor_getter_backward (__main__.TestAutograd)
Mar 19 16:36:10 ----------------------------------------------------------------------
Mar 19 16:36:10 Traceback (most recent call last):
Mar 19 16:36:10   File "/var/lib/jenkins/workspace/test/common_utils.py", line 120, in wrapper
Mar 19 16:36:10     fn(*args, **kwargs)
Mar 19 16:36:10   File "test_autograd.py", line 671, in test_sparse_ctor_getter_backward
Mar 19 16:36:10     test(sparse_size + dense_size, len(sparse_size), nnz, device)
Mar 19 16:36:10   File "test_autograd.py", line 653, in test
Mar 19 16:36:10     gradcheck(fn, (inp,))
Mar 19 16:36:10   File "/opt/conda/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 247, in gradcheck
Mar 19 16:36:10     'numerical:%s\nanalytical:%s\n' % (i, j, n, a))
Mar 19 16:36:10   File "/opt/conda/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 202, in fail_test
Mar 19 16:36:10     raise RuntimeError(msg)
Mar 19 16:36:10 RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
Mar 19 16:36:10 numerical:tensor([[0.0145, 0.0000],
Mar 19 16:36:10         [0.0000, 0.0000],
Mar 19 16:36:10         [0.0145, 0.0108],
Mar 19 16:36:10         [0.0000, 0.0000],
Mar 19 16:36:10         [0.0145, 0.0108],
Mar 19 16:36:10         [0.0000, 0.0000],
Mar 19 16:36:10         [0.0145, 0.0108],
Mar 19 16:36:10         [0.0000, 0.0000],
Mar 19 16:36:10         [0.0145, 0.0108],
Mar 19 16:36:10         [0.0000, 0.0000]])
Mar 19 16:36:10 analytical:tensor([[0.0145, 0.0000],
Mar 19 16:36:10         [0.0000, 0.0108],
Mar 19 16:36:10         [0.0145, 0.0000],
Mar 19 16:36:10         [0.0000, 0.0108],
Mar 19 16:36:10         [0.0145, 0.0000],
Mar 19 16:36:10         [0.0000, 0.0108],
Mar 19 16:36:10         [0.0145, 0.0000],
Mar 19 16:36:10         [0.0000, 0.0108],
Mar 19 16:36:10         [0.0145, 0.0000],
Mar 19 16:36:10         [0.0000, 0.0108]])

idk if this test failure is legit

@yf225 yf225 force-pushed the fix_sparse_add_noncontiguous branch from 01dd5ba to 6bff499 Compare March 19, 2019 20:43
@yf225 yf225 force-pushed the fix_sparse_add_noncontiguous branch 2 times, most recently from 37577e3 to a445698 Compare March 20, 2019 01:11
@yf225 yf225 force-pushed the fix_sparse_add_noncontiguous branch from a445698 to 17ec4cb Compare March 20, 2019 01:12
@ezyang
Copy link
Contributor

ezyang commented Mar 21, 2019

It would be really helpful review if the PR message explained how exactly the problem was solved.

return r._coalesced_(t_coalesced && s_coalesced);
LongTensor r_indices = at::cat({t_indices, s_indices}, 1);
Tensor r_values = at::cat({t_values, s_values}, 0);
alias_into_sparse(r, r_indices, r_values);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you cat'ed, don't you have to specify the output is not coalesced

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should make this a parameter on alias_into_sparse so people have to consider it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alias_into_sparse(...) calls set_indices_and_values_unsafe(...) internally which always sets coalesced_ = false, and we expect users to call sparse_tensor._coalesced_(...) afterwards if they want to change the coalesce-ness of the sparse tensor. For example:

alias_into_sparse(r, mask_indices.clone(), r_values);
r._coalesced_(mask.is_coalesced());

To simplify this API, we can add an is_coalesced parameter on alias_into_sparse, possibly in a separate PR.

@ezyang
Copy link
Contributor

ezyang commented Mar 21, 2019

Can we get some benchmark numbers? I'm not sure if some of our embedding examples exercise sparse-sparse, but if it does that would be most representative.

I don't think it's necessarily wrong to switch to cat'ing the indices and values together, but I feel you could have also fixed the problem by simply switching values to use an accessor (which respects strides) rather than pointer arithmetic (which doesn't). So the algorithm change should be justified.

@yf225 yf225 changed the title Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values [WIP] Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values Mar 21, 2019
@gchanan
Copy link
Contributor

gchanan commented Mar 21, 2019

How about only catting if the tensors aren't contiguous? That way we only (potentially) slow down paths that were broken anyway.

@yf225 yf225 changed the title [WIP] Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values Mar 21, 2019
@yf225
Copy link
Contributor Author

yf225 commented Mar 21, 2019

@ezyang @gchanan I haven't figured out a way to make THBlas_axpy work with non-contiguous values, and I opt for cat'ing when the tensors aren't contiguous. This shouldn't hurt performance because the path with non-contiguous values is broken anyway.

int64_t blockSize = r_values.stride(0);
int64_t cmp, d;
int64_t r_i = 0, t_i = 0, s_i = 0;
if (s_values.is_contiguous() && t_values.is_contiguous()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no change in this if-branch compared to the original code - I only indented it.

// index goes backwards) which may be more precise than using the
// coalesced flag here. But this is easy.
return r._coalesced_(t_coalesced && s_coalesced);
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-branch is the actual addition.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Mar 23, 2019
…iguous values (#18179)

Summary:
Currently, this code gives incorrect result:
```python
import torch
indices=torch.tensor([[7, 1, 3]])
values=torch.tensor([[1., 1., 1.],
               [1., 1., 1.],
               [1., 1., 1.]])
x = torch.sparse_coo_tensor(indices, values, size=(10, 3))
values=torch.tensor(1.).expand(3, 3)
y = torch.sparse_coo_tensor(indices, values, size=(10, 3))
z = x + y

tensor(indices=tensor([[7, 1, 3]]),
       values=tensor([[2., 1., 1.],
                      [1., 1., 1.],
                      [1., 1., 1.]]),
       size=(10, 3), nnz=3, layout=torch.sparse_coo)
```

This PR fixes the bug by adding special handling for sparse tensors with non-contiguous values in the addition function (specifically, by cat'ing the indices and values together).

This PR closes pytorch/pytorch#17950 and pytorch/pytorch#17919.
Pull Request resolved: pytorch/pytorch#18179

Reviewed By: ezyang

Differential Revision: D14569591

Pulled By: yf225

fbshipit-source-id: f5a14c4a31337fc95eab64596212066b4fb18b1a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect sparse add when tensor has non-contiguous values
6 participants