Revert "[Kernel] changing fused moe kernel chunk size default to 32k (#7995)" #207

gshtras · 2024-09-25T15:04:51Z

Reverting the change that causes a regression on mixtral 8x22 with fp8 quantization (both dynamic and pre-quantized) on MI300 and H100

There was also a mismatch in the default value between the 2 sections that this also fixes.

…llm-project#7995)" This reverts commit 34a0e96.

gshtras · 2024-09-25T15:05:25Z

PS this is specifically to address SWDEV-486909

shajrawi

ship it!

divakar-amd

Looks good! Bigger chunk size is even better for us for 2 reasons:

Upstream PR reference: vllm-project#7995

Revert "[Kernel] changing fused moe kernel chunk size default to 32k (v…

c03df88

…llm-project#7995)" This reverts commit 34a0e96.

gshtras requested a review from divakar-amd September 25, 2024 15:13

shajrawi approved these changes Sep 25, 2024

View reviewed changes

divakar-amd approved these changes Sep 25, 2024

View reviewed changes

gshtras merged commit cc2039c into main Sep 25, 2024
16 of 17 checks passed

gshtras deleted the moe_size_revert branch September 25, 2024 15:34

Provide feedback