Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 3 #258

Closed
merrymercy opened this issue May 24, 2024 · 3 comments

Comments

@merrymercy
Copy link

Error traceback

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/sglang/python/sglang/srt/layers/radix_attention.py", line 128, in forward
    return self.extend_forward(q, k, v, input_metadata)
  File "/root/sglang/python/sglang/srt/layers/radix_attention.py", line 104, in prefill_forward_flashinfer
    o = input_metadata.prefill_wrapper.forward(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 498, in forward
    return self._wrapper.forward(
RuntimeError: BatchPrefillWithPagedKVCachePyTorchWrapper::Forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, unsigned int, bool, float, float, float, bool)::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()> failed to dispatch group_size 3

Shape information

num_heads 24
num_kv_heads 8
head_dim 128
q.shape torch.Size([6, 3072])
@merrymercy
Copy link
Author

similar to this one #254

@yzh119
Copy link
Collaborator

yzh119 commented May 24, 2024

Yes I'm coming to fix these series of issues :)

@yzh119
Copy link
Collaborator

yzh119 commented Jun 15, 2024

With #301 merged, now flashinfer's prefill kernels support any group sizes, and decode kernels support group size 1-8.

@yzh119 yzh119 closed this as completed Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants