Fix dense Embedding to work with double backward #9078

kshitij12345 · 2018-07-01T11:31:45Z

Fixes : #6469

ATen/native/native_functions.yml had dispatch variants for for embedding_dense_backward , however embedding_backward explicitly made call to it, thus leading to error.
In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data pointer.

Both have been solved and checked against (on CUDA and CPU)

As mentioned in the issue

import torch

class Test(torch.nn.Module):
    
    def __init__(self):
        super(Test,self).__init__()
        self.embd = torch.nn.Embedding(1000, 100)
        self.dense = torch.nn.Linear(100, 1)
    
    def forward(self, inp):
        inp = self.embd(inp)
        return self.dense(inp)

test = Test()
#test.cuda()
inp = torch.tensor([0,1,2,1,1])
out = test(inp)
raw_loss = out.mean(dim=0)

loss_grad = torch.autograd.grad(outputs=raw_loss,
                         inputs=list(test.parameters()),
                         retain_graph=True, create_graph=True, only_inputs=True)
norm = sum([param.norm()**2 for param in loss_grad])
loss = raw_loss + norm

loss.backward(retain_graph=True)

print(test.embd.weight.grad)

Test Script

import torch
import time
start = time.time()
l = [1,1]*100 
input = torch.tensor([[1,0],[1,0]],device='cpu')
embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu')

sq = embedding_matrix * embedding_matrix
emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False)

print('Embedding Matrix')
print(embedding_matrix)
print('-----------------')

#prod = torch.cumprod(emb,1)
sum_ = emb.sum()#prod.sum()

loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True)

print('Gradient')
print(loss_grad)
print('-----------------')

sum2_ = sum_ + loss_grad.sum()
print(sum2_)
sum2_.backward()

print(embedding_matrix.grad)
print(time.time() - start)

ssnl · 2018-07-01T17:34:18Z

shouldn't the solution be adding a custom double backward, rather than slowing down backward with autograd ops?

colesbury

What @ssnl said. This looks like it will slow down embedding significantly. We should just define a derivative for embedding_backward.

kshitij12345 · 2018-07-02T05:46:34Z

Sure , I ll try that and update once done.

weiyangfb · 2018-07-10T16:37:26Z

@kshitij12345 this needs a rebase now

kshitij12345 · 2018-07-11T10:13:39Z

@weiyangfb sure will do that.

t-vi · 2018-07-12T06:01:07Z

Wouldn't it still be better to have a fused index_add_ + mul for the backward than implementing a specific double backward? I'd think that it is probably a bit less code and more similar to what we do for other ops.

kshitij12345 · 2018-07-14T19:42:22Z

@ssnl I have tried and here is my opinion from what I understand

In the derivatives.yaml,

- name: embedding(Tensor weight, Tensor indices, int64_t padding_idx, bool scale_grad_by_freq, bool sparse)
  weight: embedding_backward(grad, indices, weight.size(0), padding_idx, scale_grad_by_freq, sparse)

- name: _embedding_bag(Tensor weight, Tensor indices, Tensor offsets, bool scale_grad_by_freq, int64_t mode, bool sparse)
  weight: _embedding_bag_backward(grad, indices, offsets, result1, result2, result3, weight.size(0), scale_grad_by_freq, mode, sparse)

We'll need to pass in weight as an argument to embedding_backward for the embedding_double_backward to update gradient on weight, but that will also require minor changes in embedding_bag_sparse_backward as it calls embedding_backward.

Also embedding_backward calls embedding_dense_backward and embedding_sparse_backward. Thus each will need its own version of double backward.

So I believe that as @t-vi suggested, we should opt for fused index_add_ + mul.
Please let me know what you think.
Thank You.

ssnl · 2018-07-15T09:17:39Z

@t-vi @kshitij12345 I'm fine if you want to implement fused index_add_ + mul and also write a backward for that. But that would be considerably more work than just writing a custom double backward for this.

kshitij12345 · 2018-07-15T17:44:37Z

Oh , will take a look and update you on it.

kshitij12345 · 2019-02-23T06:57:15Z

@colesbury , please review

soumith · 2019-03-29T22:38:45Z

@pytorchbot rebase this please

pytorchbot · 2019-03-29T22:38:46Z

There's nothing to do! This branch is already up to date with master (1240327).

(To learn more about this bot, see Bot commands.)

soumith

thanks a lot for your contribution Kshitij, it looks like this is finally good to go :)

facebook-github-bot

@soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

a

Summary: Fixes : #6469 1. `ATen/native/native_functions.yml` had [dispatch](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/native_functions.yaml#L451-L455) variants for for `embedding_dense_backward` , however `embedding_backward` explicitly made [call](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/Embedding.cpp#L35-L45) to it, thus leading to error. 2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data [pointer](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/Embedding.cpp#L93). Both have been solved and checked against (on CUDA and CPU) 1. As mentioned in the issue ``` import torch class Test(torch.nn.Module): def __init__(self): super(Test,self).__init__() self.embd = torch.nn.Embedding(1000, 100) self.dense = torch.nn.Linear(100, 1) def forward(self, inp): inp = self.embd(inp) return self.dense(inp) test = Test() inp = torch.tensor([0,1,2,1,1]) out = test(inp) raw_loss = out.mean(dim=0) loss_grad = torch.autograd.grad(outputs=raw_loss, inputs=list(test.parameters()), retain_graph=True, create_graph=True, only_inputs=True) norm = sum([param.norm()**2 for param in loss_grad]) loss = raw_loss + norm loss.backward(retain_graph=True) print(test.embd.weight.grad) ``` 2. Test Script ``` import torch import time start = time.time() l = [1,1]*100 input = torch.tensor([[1,0],[1,0]],device='cpu') embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu') sq = embedding_matrix * embedding_matrix emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False) print('Embedding Matrix') print(embedding_matrix) print('-----------------') sum_ = emb.sum()#prod.sum() loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True) print('Gradient') print(loss_grad) print('-----------------') sum2_ = sum_ + loss_grad.sum() print(sum2_) sum2_.backward() print(embedding_matrix.grad) print(time.time() - start) ``` Pull Request resolved: pytorch/pytorch#9078 Reviewed By: ezyang Differential Revision: D14691901 Pulled By: soumith fbshipit-source-id: 78e2612ba39080be564c876311671eb5a0119a0f

kshitij12345 requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 1, 2018 11:31

soumith changed the title ~~fix embedding_dense_backward (#6469)~~ Fix dense Embedding to work with double backward Jul 1, 2018

soumith assigned colesbury Jul 1, 2018

colesbury previously requested changes Jul 1, 2018

View reviewed changes

kshitij12345 requested review from anderspapitto, bddppq, dzhulgakov, ebetica, fmassa, goldsborough, houseroad, jamesr66a, pietern, smessmer, ssnl, teng-li and Yangqing as code owners July 20, 2018 16:27

zdevito removed request for zdevito, dzhulgakov and houseroad February 13, 2019 01:22

kshitij12345 force-pushed the master branch from 19a52f3 to 7e96c03 Compare February 22, 2019 13:36

kshitij12345 closed this Mar 19, 2019

kshitij12345 force-pushed the master branch from 7e96c03 to 1d827b7 Compare March 19, 2019 01:58

kshitij12345 reopened this Mar 19, 2019

kshitij12345 force-pushed the master branch 2 times, most recently from b0b1aa3 to 2e53c57 Compare March 19, 2019 02:33

kshitij12345 added 8 commits March 19, 2019 15:47

add embedding_dense_double_backward

a2a8806

add test for embedding_dense_double_backward

f79695f

fix incorrect variable name

93a88a8

move embd_dense_double_backwar to templates/Functions

5b087bb

rename output grad variable

70921ba

rename grad_f to grad_output

f00572b

add double_backward padding test

c34e16a

remove stray definition from native_functions

10c8c9f

kshitij12345 force-pushed the master branch from 2e53c57 to 10c8c9f Compare March 19, 2019 10:17

Merge branch 'master' into master

2f07500

soumith approved these changes Mar 29, 2019

View reviewed changes

facebook-github-bot reviewed Mar 29, 2019

View reviewed changes

facebook-github-bot closed this in 0916b54 Apr 3, 2019

ezyang added open source merged labels Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dense Embedding to work with double backward #9078

Fix dense Embedding to work with double backward #9078

kshitij12345 commented Jul 1, 2018

ssnl commented Jul 1, 2018

colesbury left a comment •

edited

Loading

kshitij12345 commented Jul 2, 2018

weiyangfb commented Jul 10, 2018 •

edited

Loading

kshitij12345 commented Jul 11, 2018

t-vi commented Jul 12, 2018

kshitij12345 commented Jul 14, 2018

ssnl commented Jul 15, 2018

kshitij12345 commented Jul 15, 2018

kshitij12345 commented Feb 23, 2019

soumith commented Mar 29, 2019

pytorchbot commented Mar 29, 2019

soumith left a comment

facebook-github-bot left a comment

Fix dense Embedding to work with double backward #9078

Fix dense Embedding to work with double backward #9078

Conversation

kshitij12345 commented Jul 1, 2018

ssnl commented Jul 1, 2018

colesbury left a comment • edited Loading

Choose a reason for hiding this comment

kshitij12345 commented Jul 2, 2018

weiyangfb commented Jul 10, 2018 • edited Loading

kshitij12345 commented Jul 11, 2018

t-vi commented Jul 12, 2018

kshitij12345 commented Jul 14, 2018

ssnl commented Jul 15, 2018

kshitij12345 commented Jul 15, 2018

kshitij12345 commented Feb 23, 2019

soumith commented Mar 29, 2019

pytorchbot commented Mar 29, 2019

soumith left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

colesbury left a comment •

edited

Loading

weiyangfb commented Jul 10, 2018 •

edited

Loading