Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dense Embedding to work with double backward #9078

Closed
wants to merge 9 commits into from

Conversation

kshitij12345
Copy link
Collaborator

Fixes : #6469

  1. ATen/native/native_functions.yml had dispatch variants for for embedding_dense_backward , however embedding_backward explicitly made call to it, thus leading to error.

  2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data pointer.

Both have been solved and checked against (on CUDA and CPU)

  1. As mentioned in the issue
import torch

class Test(torch.nn.Module):
    
    def __init__(self):
        super(Test,self).__init__()
        self.embd = torch.nn.Embedding(1000, 100)
        self.dense = torch.nn.Linear(100, 1)
    
    def forward(self, inp):
        inp = self.embd(inp)
        return self.dense(inp)

test = Test()
#test.cuda()
inp = torch.tensor([0,1,2,1,1])
out = test(inp)
raw_loss = out.mean(dim=0)

loss_grad = torch.autograd.grad(outputs=raw_loss,
                         inputs=list(test.parameters()),
                         retain_graph=True, create_graph=True, only_inputs=True)
norm = sum([param.norm()**2 for param in loss_grad])
loss = raw_loss + norm

loss.backward(retain_graph=True)

print(test.embd.weight.grad)

  1. Test Script
import torch
import time
start = time.time()
l = [1,1]*100 
input = torch.tensor([[1,0],[1,0]],device='cpu')
embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu')

sq = embedding_matrix * embedding_matrix
emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False)

print('Embedding Matrix')
print(embedding_matrix)
print('-----------------')

#prod = torch.cumprod(emb,1)
sum_ = emb.sum()#prod.sum()

loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True)

print('Gradient')
print(loss_grad)
print('-----------------')

sum2_ = sum_ + loss_grad.sum()
print(sum2_)
sum2_.backward()

print(embedding_matrix.grad)
print(time.time() - start)

@soumith soumith changed the title fix embedding_dense_backward (#6469) Fix dense Embedding to work with double backward Jul 1, 2018
@ssnl
Copy link
Collaborator

ssnl commented Jul 1, 2018

shouldn't the solution be adding a custom double backward, rather than slowing down backward with autograd ops?

Copy link
Member

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @ssnl said. This looks like it will slow down embedding significantly. We should just define a derivative for embedding_backward.

@kshitij12345
Copy link
Collaborator Author

Sure , I ll try that and update once done.

@weiyangfb
Copy link
Contributor

weiyangfb commented Jul 10, 2018

@kshitij12345 this needs a rebase now

@kshitij12345
Copy link
Collaborator Author

@weiyangfb sure will do that.

@t-vi
Copy link
Collaborator

t-vi commented Jul 12, 2018

Wouldn't it still be better to have a fused index_add_ + mul for the backward than implementing a specific double backward? I'd think that it is probably a bit less code and more similar to what we do for other ops.

@kshitij12345
Copy link
Collaborator Author

@ssnl I have tried and here is my opinion from what I understand

In the derivatives.yaml,

- name: embedding(Tensor weight, Tensor indices, int64_t padding_idx, bool scale_grad_by_freq, bool sparse)
  weight: embedding_backward(grad, indices, weight.size(0), padding_idx, scale_grad_by_freq, sparse)

- name: _embedding_bag(Tensor weight, Tensor indices, Tensor offsets, bool scale_grad_by_freq, int64_t mode, bool sparse)
  weight: _embedding_bag_backward(grad, indices, offsets, result1, result2, result3, weight.size(0), scale_grad_by_freq, mode, sparse)

We'll need to pass in weight as an argument to embedding_backward for the embedding_double_backward to update gradient on weight, but that will also require minor changes in embedding_bag_sparse_backward as it calls embedding_backward.

Also embedding_backward calls embedding_dense_backward and embedding_sparse_backward. Thus each will need its own version of double backward.

So I believe that as @t-vi suggested, we should opt for fused index_add_ + mul.
Please let me know what you think.
Thank You.

@ssnl
Copy link
Collaborator

ssnl commented Jul 15, 2018

@t-vi @kshitij12345 I'm fine if you want to implement fused index_add_ + mul and also write a backward for that. But that would be considerably more work than just writing a custom double backward for this.

@kshitij12345
Copy link
Collaborator Author

Oh , will take a look and update you on it.

@kshitij12345
Copy link
Collaborator Author

@colesbury , please review

@soumith
Copy link
Member

soumith commented Mar 29, 2019

@pytorchbot rebase this please

@pytorchbot
Copy link
Collaborator

There's nothing to do! This branch is already up to date with master (1240327).

(To learn more about this bot, see Bot commands.)

Copy link
Member

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for your contribution Kshitij, it looks like this is finally good to go :)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Apr 3, 2019
Summary:
Fixes : #6469

1. `ATen/native/native_functions.yml` had [dispatch](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/native_functions.yaml#L451-L455) variants for for `embedding_dense_backward` , however `embedding_backward` explicitly made [call](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/Embedding.cpp#L35-L45) to it, thus leading to error.

2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data [pointer](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/Embedding.cpp#L93).

Both have been solved and checked against (on CUDA and CPU)

1.  As mentioned in the issue
```
import torch

class Test(torch.nn.Module):

    def __init__(self):
        super(Test,self).__init__()
        self.embd = torch.nn.Embedding(1000, 100)
        self.dense = torch.nn.Linear(100, 1)

    def forward(self, inp):
        inp = self.embd(inp)
        return self.dense(inp)

test = Test()
inp = torch.tensor([0,1,2,1,1])
out = test(inp)
raw_loss = out.mean(dim=0)

loss_grad = torch.autograd.grad(outputs=raw_loss,
                         inputs=list(test.parameters()),
                         retain_graph=True, create_graph=True, only_inputs=True)
norm = sum([param.norm()**2 for param in loss_grad])
loss = raw_loss + norm

loss.backward(retain_graph=True)

print(test.embd.weight.grad)

```

2. Test Script
```
import torch
import time
start = time.time()
l = [1,1]*100
input = torch.tensor([[1,0],[1,0]],device='cpu')
embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu')

sq = embedding_matrix * embedding_matrix
emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False)

print('Embedding Matrix')
print(embedding_matrix)
print('-----------------')

sum_ = emb.sum()#prod.sum()

loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True)

print('Gradient')
print(loss_grad)
print('-----------------')

sum2_ = sum_ + loss_grad.sum()
print(sum2_)
sum2_.backward()

print(embedding_matrix.grad)
print(time.time() - start)
```
Pull Request resolved: pytorch/pytorch#9078

Reviewed By: ezyang

Differential Revision: D14691901

Pulled By: soumith

fbshipit-source-id: 78e2612ba39080be564c876311671eb5a0119a0f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PyTorch] Dense embedding doesn't work with double backward