-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Qwen2 Moe FP8 not supported on L40 #6264
Comments
fp8 not yet supported for Qwen. WIP PR: #6088 |
@robertgshaw2-neuralmagic Hello, the error still exists in version 0.5.3 . |
Fp8 is now supported for Qwen, but MoE Fp8 requires compute_capability == 9.0 (aka Hopper GPUs) Our MoE kernels are currently implemented using Triton, which require triton==3.0 for Fp8 on Ada Lovelace. We are limited by PyTorch's version of triton We look forward to supporting Fp8 MoE on Ada Lovelace once these dependencies are enabled |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
Your current environment
🐛 Describe the bug
After loading a fp8 qwen2 moe model
The config.json is
The text was updated successfully, but these errors were encountered: