Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5 #254

QwertyJack · 2024-05-23T16:16:50Z

I am trying to use SGLang serving Qwen1.5-32B-Chat but it complains

  ...
  File "/home/jack/.conda/envs/sglang/lib/python3.11/site-packages/sglang/srt/layers/radix_attention.py", line 92, in prefill_forward_flashinfer
    o = input_metadata.prefill_wrapper.forward(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jack/.conda/envs/sglang/lib/python3.11/site-packages/flashinfer/prefill.py", line 498, in forward
    return self._wrapper.forward(
           ^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: BatchPrefillWithPagedKVCachePyTorchWrapper::Forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, unsigned int, bool, float, float, float, bool)::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()> failed to dispatch grou
p_size 5

Env: rtx 3090, cuda-12.3, py3.11, torch-2.3.0, SGLang-0.1.16, flashinfer-0.0.4+cu121torch2.3-cp311-cp311-linux_x86_64.whl

Btw, Qwen1.5-14B-Chat works like a charm ;)

Any chance that we can get Qwen1.5-32B supported?

Thanks in advance！

The text was updated successfully, but these errors were encountered:

yzh119 · 2024-06-15T06:45:12Z

@QwertyJack Thank you for the feedback.

With #301 merged, now flashinfer's prefill kernels support any group sizes, and decode kernels support group size 1-8.

merrymercy mentioned this issue May 24, 2024

[Bug report] BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 3 #258

Closed

yzh119 mentioned this issue May 27, 2024

[WIP] rafactor: make gqa_group_size a function argument instead of template parameter #262

Closed

yzh119 closed this as completed Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5 #254

Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5 #254

QwertyJack commented May 23, 2024

yzh119 commented Jun 15, 2024

Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5 #254

Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5 #254

Comments

QwertyJack commented May 23, 2024

yzh119 commented Jun 15, 2024