-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue in reformer: Reformer doesn't depend on its key feature -- LSHSelfAttention
#16972
Comments
LSHSelfAttention
LSHSelfAttention
LSHSelfAttention
Hey @leo-liuzy, Sorry what exactly is the issue here with Reformer? Is the training not working? |
Hi @patrickvonplaten , I am evaluating released model trained on crime and punishment (with examples randomly grabbed from crime and punishment). I found if I remove LSHSelfAttention output from producing perplexity. The perplexity doesn't change much. But if I remove LocalSelfAttention, the PPL goes up by a lot. So, I wonder if this is caused a bug (even during training) in codebase, or it's intrinsic to the specific reformer's model structure -- ( |
I'm not really sure @leo-liuzy sadly - I've never removed the local layers when training the model. Maybe you can also try asking on https://discuss.huggingface.co/ :-) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
Who can help?
@patrickvonplaten
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Make file changes (very minimal changes) as my PR here: leo-liuzy#2
Changes are located here
I made my fork from huggingface main two days ago.
I also play with removing
LocalSelfAttention
and the perplexity greatly improve especially withlong_inputs_lst
(in the file). When just usingLSHSelfAttention
, increase num_hash doesn't help.My question is: could this be caused by an innocent bug in transferring from Reformer's official code? Or is this intrinsic to the reformer?
I know in reformer they had a 20-layer transformer trained with 20 LSHSelfAttention and it shows good performance; that's why it further confused me.
Expected behavior
The text was updated successfully, but these errors were encountered: