-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support versatile gqa size for batch prefill #223
support versatile gqa size for batch prefill #223
Conversation
Hi @xuzhenqi thanks so much for doing this, I'm refactoring the code to make group size a regular function argument instead of template parameter so that we can reduce the binary size. I'll notify you when the PR is ready list you as an co-author of that PR :) |
@yzh119 Do you have ETA on the dynamic group-size support? It may still be good to merge this PR if there no performance regressions, and then revert to the new solution when it is ready. The Yi-1.5 34B has hit the pipelines and I believe a lot more users will want to use this model even more so than Yi 1.0 34B. This PR covers that model and many others that do not fall in the statically compiled group_size slots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts commit 3b3ce05. wip rebase wip wip upd upd remove macros that are no longer used upd upd upd bugfix 太阳,请赐予我完成它的力量吧 fix fix
This reverts commit 3b3ce05. wip rebase wip wip upd upd remove macros that are no longer used upd upd upd bugfix 太阳,请赐予我完成它的力量吧 fix fix remove redundant code
This reverts commit 3b3ce05. wip rebase wip wip upd upd remove macros that are no longer used upd upd upd bugfix 太阳,请赐予我完成它的力量吧 fix fix remove redundant code remove conflict
This reverts commit 3b3ce05. wip rebase wip wip upd upd remove macros that are no longer used upd upd upd bugfix 太阳,请赐予我完成它的力量吧 fix fix remove redundant code remove conflict
This merge request supports versatile gqa size for batch prefill kernels. Group size 5, 6, 7, will be padded to group size 8 when loading q from global memory to shared memory, the padded groups will be discarded when writing o to global memory.