We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am trying to use SGLang serving Qwen1.5-32B-Chat but it complains
... File "/home/jack/.conda/envs/sglang/lib/python3.11/site-packages/sglang/srt/layers/radix_attention.py", line 92, in prefill_forward_flashinfer o = input_metadata.prefill_wrapper.forward( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jack/.conda/envs/sglang/lib/python3.11/site-packages/flashinfer/prefill.py", line 498, in forward return self._wrapper.forward( ^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: BatchPrefillWithPagedKVCachePyTorchWrapper::Forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, unsigned int, bool, float, float, float, bool)::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()> failed to dispatch grou p_size 5
Env: rtx 3090, cuda-12.3, py3.11, torch-2.3.0, SGLang-0.1.16, flashinfer-0.0.4+cu121torch2.3-cp311-cp311-linux_x86_64.whl
flashinfer-0.0.4+cu121torch2.3-cp311-cp311-linux_x86_64.whl
Btw, Qwen1.5-14B-Chat works like a charm ;)
Any chance that we can get Qwen1.5-32B supported?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
gqa_group_size
@QwertyJack Thank you for the feedback.
With #301 merged, now flashinfer's prefill kernels support any group sizes, and decode kernels support group size 1-8.
Sorry, something went wrong.
No branches or pull requests
I am trying to use SGLang serving Qwen1.5-32B-Chat but it complains
Env: rtx 3090, cuda-12.3, py3.11, torch-2.3.0, SGLang-0.1.16,
flashinfer-0.0.4+cu121torch2.3-cp311-cp311-linux_x86_64.whl
Btw, Qwen1.5-14B-Chat works like a charm ;)
Any chance that we can get Qwen1.5-32B supported?
Thanks in advance!
The text was updated successfully, but these errors were encountered: