You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I know is that to train transformer MT, set sparseadam to optimizer instead of adam through # 637
Therefore, When I tried to train transformer MT with -share_decoder_embeddings option, error occurred with message "SparseAdam does not support dense gradients, please consider Adam instead"
Is there any problem with changing the optimizer to adam according to the error message?
Thanks!
The text was updated successfully, but these errors were encountered:
Can you please post your command line and error so that I can have a look ? (I think we may drop sparse adam in the future unless pytorch supports sparse for distributed functions)
vince62s
changed the title
[error] Transformer with share_decoder_embeddings option
SparseAdam does not support with share_decoder_embeddings option
Nov 3, 2018
Hi,
As far as I know is that to train transformer MT, set sparseadam to optimizer instead of adam through # 637
Therefore, When I tried to train transformer MT with
-share_decoder_embeddings
option, error occurred with message "SparseAdam does not support dense gradients, please consider Adam instead"Is there any problem with changing the optimizer to adam according to the error message?
Thanks!
The text was updated successfully, but these errors were encountered: