Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

可以跟Flash Attention一起用吗 #6

Open
XiaoduoAILab opened this issue Jan 1, 2025 · 1 comment
Open

可以跟Flash Attention一起用吗 #6

XiaoduoAILab opened this issue Jan 1, 2025 · 1 comment

Comments

@XiaoduoAILab
Copy link

张老师你好,请问move_elision实现可以跟FA2一起用吗?
另外我做了简单测速,发现推理并没有变快:
image
反倒比没用MLA,和不用move_elision都慢些。

@XiaoduoAILab
Copy link
Author

显存占用确实对比实验4明显减小了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant