-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: dynamic shared mem moe_align_block_size_kernel #3376
Feature: dynamic shared mem moe_align_block_size_kernel #3376
Conversation
@pcmoritz Can you take a look? The PR looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the PR looks good to me, feel free to merge @WoosukKwon !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akhoroshev Thanks for submitting the PR! Left minor comments.
Also, just curious: which MoE model are you using? Is there any public model with more than 128 experts? |
This is not a public model |
@WoosukKwon comments taken into account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akhoroshev LGTM! Thanks for the PR!
…size_kernel (vllm-project#3376)" This reverts commit 78b6c48.
…n_block_size_kernel (vllm-project#3376)"" This reverts commit fe983cc.
I encountered compilation errors related to insufficient shared memory size when I tried to increase NUM_MAX_EXPERTS to 128.
@zwd003