Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5 #254

Closed
QwertyJack opened this issue May 23, 2024 · 1 comment

Comments

@QwertyJack
Copy link

I am trying to use SGLang serving Qwen1.5-32B-Chat but it complains

  ...
  File "/home/jack/.conda/envs/sglang/lib/python3.11/site-packages/sglang/srt/layers/radix_attention.py", line 92, in prefill_forward_flashinfer
    o = input_metadata.prefill_wrapper.forward(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jack/.conda/envs/sglang/lib/python3.11/site-packages/flashinfer/prefill.py", line 498, in forward
    return self._wrapper.forward(
           ^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: BatchPrefillWithPagedKVCachePyTorchWrapper::Forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, unsigned int, bool, float, float, float, bool)::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()> failed to dispatch grou
p_size 5

Env: rtx 3090, cuda-12.3, py3.11, torch-2.3.0, SGLang-0.1.16, flashinfer-0.0.4+cu121torch2.3-cp311-cp311-linux_x86_64.whl

Btw, Qwen1.5-14B-Chat works like a charm ;)

Any chance that we can get Qwen1.5-32B supported?

Thanks in advance!

@yzh119
Copy link
Collaborator

yzh119 commented Jun 15, 2024

@QwertyJack Thank you for the feedback.

With #301 merged, now flashinfer's prefill kernels support any group sizes, and decode kernels support group size 1-8.

@yzh119 yzh119 closed this as completed Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants