Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bidirectional transformer #7

Open
hhhey-lw opened this issue Jul 13, 2024 · 0 comments
Open

Bidirectional transformer #7

hhhey-lw opened this issue Jul 13, 2024 · 0 comments

Comments

@hhhey-lw
Copy link

I would like to ask why it is necessary to have a two-way Self-Attention. As far as I know, in Transformer, the MHSA of the Encoder part is a two-way concern, refer to BERT. The Mask MHSA in the Decoder part is to cover up the future information and only focus on the past information, refer to GPT. And in your code, the input to Transformer is not transposed, but it is standard practice to transform [batch,channel,length] to [batch,length,channel].

@hhhey-lw hhhey-lw reopened this Jul 14, 2024
@hhhey-lw hhhey-lw reopened this Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant