[modelling] remove un-necessary transpose for fa2 attention #31749

sanchit-gandhi · 2024-07-02T14:18:29Z

What does this PR do?

Fixes #31166 (comment). Previously, we were performing three shape operations on the q_proj tensor:

Reshape to (bsz, tgt_len, num_heads, head_dim)
Transpose to (bsz, num_heads, tgt_len, head_dim)
Transpose to (bsz, tgt_len, num_heads, head_dim)

Clearly 2 and 3 are not required! This PR fixes the modelling code accordingly.

HuggingFaceDocBuilderDev · 2024-07-02T14:40:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks!

Could you check for similar patterns in the library in FA2, it's possible we're extra transposing all over the place

sanchit-gandhi · 2024-07-03T13:07:31Z

The pattern appears in most LLaMA-style attention modules, but here the back-to-back transpose operations are required, since there's a step in between the transpose ops:

Reshape to (bsz, tgt_len, num_heads, head_dim)
Transpose to (bsz, num_heads, tgt_len, head_dim)
Apply rotary pos embeddings (note that this requires the shape produced from 2)
Transpose to (bsz, tgt_len, num_heads, head_dim)

This leaves only two additional models where there are redundant transpose ops: Idefics2 and Jamba, changes for which I've pushed to the PR.

ArthurZucker

good find!

sanchit-gandhi requested a review from amyeroberts July 2, 2024 14:24

amyeroberts approved these changes Jul 2, 2024

View reviewed changes

sanchit-gandhi force-pushed the whisper-transpose branch from 6528edb to 79b5f8b Compare July 3, 2024 12:50

sanchit-gandhi changed the title ~~[whisper] remove un-necessary transpose for fa2 attention~~ [modelling] remove un-necessary transpose for fa2 attention Jul 3, 2024

ArthurZucker approved these changes Jul 9, 2024

View reviewed changes

sanchit-gandhi added 2 commits July 22, 2024 17:34

[whisper] remove un-necessary transpose for fa2 attention

a84474b

propagate

bc88467

sanchit-gandhi force-pushed the whisper-transpose branch from aca7ef6 to bc88467 Compare July 22, 2024 09:34

sanchit-gandhi merged commit 2782aad into huggingface:main Jul 23, 2024
21 checks passed

sanchit-gandhi deleted the whisper-transpose branch July 23, 2024 06:55

tctrautman mentioned this pull request Jul 26, 2024

Idefics2 generation erroring with flash_attention_2 #32237

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[modelling] remove un-necessary transpose for fa2 attention #31749

[modelling] remove un-necessary transpose for fa2 attention #31749

sanchit-gandhi commented Jul 2, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 2, 2024

amyeroberts left a comment

sanchit-gandhi commented Jul 3, 2024 •

edited

Loading

ArthurZucker left a comment

[modelling] remove un-necessary transpose for fa2 attention #31749

[modelling] remove un-necessary transpose for fa2 attention #31749

Conversation

sanchit-gandhi commented Jul 2, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jul 2, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

sanchit-gandhi commented Jul 3, 2024 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

sanchit-gandhi commented Jul 2, 2024 •

edited

Loading

sanchit-gandhi commented Jul 3, 2024 •

edited

Loading