[Feature]: FlashAttention 3 support #6348

orellavie1212 · 2024-07-11T19:11:40Z

🚀 The feature, motivation and pitch

As you know, FA3 promises 1.5x~ improvements
Dao-AILab/flash-attention@7ef2484

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

simon-mo · 2024-07-11T19:58:53Z

Yes. Actively looking.

Update: It seem Dao-AILab/flash-attention#1268 is merged. The integration is now ready for testing.

Navanit-git · 2024-07-17T04:02:10Z

any updates??

Ushnish-Sengupta · 2024-07-19T13:14:08Z

This would be amazing. FA3 is apparently part of the reason behind Together Inference Engine leapfrogging vllm by a lot!

ehartford · 2024-07-24T08:01:45Z

Very interested in this

freddifederica · 2024-07-25T15:05:50Z

+1

CambioML · 2024-07-26T01:05:46Z

+1

badrisnps · 2024-08-30T18:36:48Z

+1

simon-mo · 2024-08-30T19:01:08Z

A small progress update here. @felixzhu555 did some testing. FA3 is mostly a training kernel at the moment and only benefit extremely large batch size with long context.

tmoon · 2024-11-05T17:43:25Z

Any further update on this?

yuvalkk2002 · 2024-11-13T06:11:04Z

+1

hitcoogle · 2024-12-03T03:42:23Z

+1

taegeonum · 2024-12-05T07:46:37Z

+1

Isaac4real · 2024-12-13T09:25:21Z

+1

HelenaSak · 2024-12-18T10:41:03Z

+1

celalettinbilgen2 · 2025-01-07T09:44:24Z

+1

leo-pony · 2025-01-16T02:11:58Z

+1

hmellor · 2025-02-21T16:43:04Z

#12093

orellavie1212 added the feature request label Jul 11, 2024

simon-mo added the help wanted Extra attention is needed label Nov 21, 2024

hmellor mentioned this issue Feb 17, 2025

[Usage]: Does vllm use the Flash Attention v3 implementation during prefill and decoding? and How？ #13128

Closed

hmellor closed this as completed Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: FlashAttention 3 support #6348

[Feature]: FlashAttention 3 support #6348

orellavie1212 commented Jul 11, 2024

simon-mo commented Jul 11, 2024 •

edited

Loading

Navanit-git commented Jul 17, 2024

Ushnish-Sengupta commented Jul 19, 2024

ehartford commented Jul 24, 2024

freddifederica commented Jul 25, 2024

CambioML commented Jul 26, 2024

badrisnps commented Aug 30, 2024

simon-mo commented Aug 30, 2024

tmoon commented Nov 5, 2024

yuvalkk2002 commented Nov 13, 2024

hitcoogle commented Dec 3, 2024

taegeonum commented Dec 5, 2024

Isaac4real commented Dec 13, 2024

HelenaSak commented Dec 18, 2024

celalettinbilgen2 commented Jan 7, 2025

leo-pony commented Jan 16, 2025

hmellor commented Feb 21, 2025

[Feature]: FlashAttention 3 support #6348

[Feature]: FlashAttention 3 support #6348

Comments

orellavie1212 commented Jul 11, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

simon-mo commented Jul 11, 2024 • edited Loading

Navanit-git commented Jul 17, 2024

Ushnish-Sengupta commented Jul 19, 2024

ehartford commented Jul 24, 2024

freddifederica commented Jul 25, 2024

CambioML commented Jul 26, 2024

badrisnps commented Aug 30, 2024

simon-mo commented Aug 30, 2024

tmoon commented Nov 5, 2024

yuvalkk2002 commented Nov 13, 2024

hitcoogle commented Dec 3, 2024

taegeonum commented Dec 5, 2024

Isaac4real commented Dec 13, 2024

HelenaSak commented Dec 18, 2024

celalettinbilgen2 commented Jan 7, 2025

leo-pony commented Jan 16, 2025

hmellor commented Feb 21, 2025

simon-mo commented Jul 11, 2024 •

edited

Loading