Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support versatile gqa size for batch prefill #223

Conversation

xuzhenqi
Copy link
Contributor

This merge request supports versatile gqa size for batch prefill kernels. Group size 5, 6, 7, will be padded to group size 8 when loading q from global memory to shared memory, the padded groups will be discarded when writing o to global memory.

@yzh119
Copy link
Collaborator

yzh119 commented May 9, 2024

Hi @xuzhenqi thanks so much for doing this, I'm refactoring the code to make group size a regular function argument instead of template parameter so that we can reduce the binary size. I'll notify you when the PR is ready list you as an co-author of that PR :)

@Qubitium
Copy link
Contributor

Qubitium commented May 15, 2024

@yzh119 Do you have ETA on the dynamic group-size support? It may still be good to merge this PR if there no performance regressions, and then revert to the new solution when it is ready. The Yi-1.5 34B has hit the pipelines and I believe a lot more users will want to use this model even more so than Yi 1.0 34B. This PR covers that model and many others that do not fall in the statically compiled group_size slots.

Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xuzhenqi @Qubitium , I'll merge this first and then create another PR for my proposed change.

@yzh119 yzh119 merged commit 3b3ce05 into flashinfer-ai:main May 15, 2024
yzh119 added a commit that referenced this pull request May 19, 2024
yzh119 added a commit that referenced this pull request May 27, 2024
yzh119 added a commit that referenced this pull request May 27, 2024
yzh119 added a commit that referenced this pull request May 30, 2024
yzh119 added a commit that referenced this pull request Jun 3, 2024
yzh119 added a commit that referenced this pull request Jun 10, 2024
This reverts commit 3b3ce05.

wip

rebase

wip

wip

upd

upd

remove macros that are no longer used

upd

upd

upd

bugfix

太阳,请赐予我完成它的力量吧

fix

fix
yzh119 added a commit that referenced this pull request Jun 10, 2024
This reverts commit 3b3ce05.

wip

rebase

wip

wip

upd

upd

remove macros that are no longer used

upd

upd

upd

bugfix

太阳,请赐予我完成它的力量吧

fix

fix

remove redundant code
yzh119 added a commit that referenced this pull request Jun 10, 2024
This reverts commit 3b3ce05.

wip

rebase

wip

wip

upd

upd

remove macros that are no longer used

upd

upd

upd

bugfix

太阳,请赐予我完成它的力量吧

fix

fix

remove redundant code

remove conflict
yzh119 added a commit that referenced this pull request Jun 10, 2024
This reverts commit 3b3ce05.

wip

rebase

wip

wip

upd

upd

remove macros that are no longer used

upd

upd

upd

bugfix

太阳,请赐予我完成它的力量吧

fix

fix

remove redundant code

remove conflict
@yzh119
Copy link
Collaborator

yzh119 commented Jun 15, 2024

@xuzhenqi @Qubitium , follow up on this.

We have merged #301 , and now we support any gqa group size for prefill kernels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants