Bidirectional transformer #7

hhhey-lw · 2024-07-13T13:11:27Z

I would like to ask why it is necessary to have a two-way Self-Attention. As far as I know, in Transformer, the MHSA of the Encoder part is a two-way concern, refer to BERT. The Mask MHSA in the Decoder part is to cover up the future information and only focus on the past information, refer to GPT. And in your code, the input to Transformer is not transposed, but it is standard practice to transform [batch,channel,length] to [batch,length,channel].

hhhey-lw closed this as completed Jul 14, 2024

hhhey-lw reopened this Jul 14, 2024

hhhey-lw closed this as completed Jul 14, 2024

hhhey-lw reopened this Jul 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bidirectional transformer #7

Bidirectional transformer #7

hhhey-lw commented Jul 13, 2024

Bidirectional transformer #7

Bidirectional transformer #7

Comments

hhhey-lw commented Jul 13, 2024