We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error traceback
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/root/sglang/python/sglang/srt/layers/radix_attention.py", line 128, in forward return self.extend_forward(q, k, v, input_metadata) File "/root/sglang/python/sglang/srt/layers/radix_attention.py", line 104, in prefill_forward_flashinfer o = input_metadata.prefill_wrapper.forward( File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 498, in forward return self._wrapper.forward( RuntimeError: BatchPrefillWithPagedKVCachePyTorchWrapper::Forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, unsigned int, bool, float, float, float, bool)::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()> failed to dispatch group_size 3
Shape information
num_heads 24 num_kv_heads 8 head_dim 128 q.shape torch.Size([6, 3072])
The text was updated successfully, but these errors were encountered:
similar to this one #254
Sorry, something went wrong.
Yes I'm coming to fix these series of issues :)
gqa_group_size
With #301 merged, now flashinfer's prefill kernels support any group sizes, and decode kernels support group size 1-8.
No branches or pull requests
Error traceback
Shape information
The text was updated successfully, but these errors were encountered: