-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: FlashAttention 3 support #6348
Comments
Yes. Actively looking. Update: It seem Dao-AILab/flash-attention#1268 is merged. The integration is now ready for testing. |
any updates?? |
This would be amazing. FA3 is apparently part of the reason behind Together Inference Engine leapfrogging vllm by a lot! |
Very interested in this |
+1 |
2 similar comments
+1 |
+1 |
A small progress update here. @felixzhu555 did some testing. FA3 is mostly a training kernel at the moment and only benefit extremely large batch size with long context. |
Any further update on this? |
+1 |
+1 |
5 similar comments
+1 |
+1 |
+1 |
+1 |
+1 |
🚀 The feature, motivation and pitch
As you know, FA3 promises 1.5x~ improvements
Dao-AILab/flash-attention@7ef2484
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: