support versatile gqa size for batch prefill #223

xuzhenqi · 2024-04-28T11:31:33Z

This merge request supports versatile gqa size for batch prefill kernels. Group size 5, 6, 7, will be padded to group size 8 when loading q from global memory to shared memory, the padded groups will be discarded when writing o to global memory.

yzh119 · 2024-05-09T21:21:42Z

Hi @xuzhenqi thanks so much for doing this, I'm refactoring the code to make group size a regular function argument instead of template parameter so that we can reduce the binary size. I'll notify you when the PR is ready list you as an co-author of that PR :)

Qubitium · 2024-05-15T02:32:57Z

@yzh119 Do you have ETA on the dynamic group-size support? It may still be good to merge this PR if there no performance regressions, and then revert to the new solution when it is ready. The Yi-1.5 34B has hit the pipelines and I believe a lot more users will want to use this model even more so than Yi 1.0 34B. This PR covers that model and many others that do not fall in the statically compiled group_size slots.

yzh119

Thank you @xuzhenqi @Qubitium , I'll merge this first and then create another PR for my proposed change.

This reverts commit 3b3ce05.

This reverts commit 3b3ce05. wip rebase wip wip upd upd remove macros that are no longer used upd upd upd bugfix 太阳,请赐予我完成它的力量吧 fix fix

This reverts commit 3b3ce05. wip rebase wip wip upd upd remove macros that are no longer used upd upd upd bugfix 太阳,请赐予我完成它的力量吧 fix fix remove redundant code

This reverts commit 3b3ce05. wip rebase wip wip upd upd remove macros that are no longer used upd upd upd bugfix 太阳,请赐予我完成它的力量吧 fix fix remove redundant code remove conflict

yzh119 · 2024-06-15T06:46:28Z

@xuzhenqi @Qubitium , follow up on this.

We have merged #301 , and now we support any gqa group size for prefill kernels.

support versatile gqa size for batch prefill

e27c5b7

xuzhenqi mentioned this pull request May 15, 2024

Add group_size 7 and fix compat with Yi 1.5 34b #246

Closed

2 tasks

yzh119 approved these changes May 15, 2024

View reviewed changes

yzh119 merged commit 3b3ce05 into flashinfer-ai:main May 15, 2024

yzh119 added a commit that referenced this pull request May 19, 2024

Revert "support versatile gqa size for batch prefill (#223)"

3ce3550

This reverts commit 3b3ce05.

yzh119 added a commit that referenced this pull request May 27, 2024

Revert "support versatile gqa size for batch prefill (#223)"

f9fa4ef

This reverts commit 3b3ce05.

yzh119 mentioned this pull request May 27, 2024

[WIP] rafactor: make gqa_group_size a function argument instead of template parameter #262

Closed

yzh119 added a commit that referenced this pull request May 27, 2024

Revert "support versatile gqa size for batch prefill (#223)"

7c1dc72

This reverts commit 3b3ce05.

yzh119 added a commit that referenced this pull request May 30, 2024

Revert "support versatile gqa size for batch prefill (#223)"

c092f0d

This reverts commit 3b3ce05.

yzh119 added a commit that referenced this pull request Jun 3, 2024

Revert "support versatile gqa size for batch prefill (#223)"

ef9bd80

This reverts commit 3b3ce05.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support versatile gqa size for batch prefill #223

support versatile gqa size for batch prefill #223

xuzhenqi commented Apr 28, 2024

yzh119 commented May 9, 2024

Qubitium commented May 15, 2024 •

edited

Loading

yzh119 left a comment

yzh119 commented Jun 15, 2024

support versatile gqa size for batch prefill #223

support versatile gqa size for batch prefill #223

Conversation

xuzhenqi commented Apr 28, 2024

yzh119 commented May 9, 2024

Qubitium commented May 15, 2024 • edited Loading

yzh119 left a comment

Choose a reason for hiding this comment

yzh119 commented Jun 15, 2024

Qubitium commented May 15, 2024 •

edited

Loading