-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip mm_mul kernel functions additions if on Intel #1294
Skip mm_mul kernel functions additions if on Intel #1294
Conversation
It's also related to ggerganov/llama.cpp#2965, there may be more related issues in llama.cpp. |
It looks good, but this is making encoding really slow. |
Otherwise, doesnt seem to work with intel at all. I wonder if its possible to write patches for the kernels. |
@ggerganov @bobqianic I dont feel confident in this merge, feels like the fix should be more elegant / actually fix the issue. What do you think? |
Agree. |
the fix should be more elegant / actually fix the issue
Going to close given the above conversation. |
I'm not sure what is the best approach here. @nchudleigh Does using Metal provide significant speed-up compared to running on the CPU for Mac Intel? |
@ggerganov I missed this entirely! Hard to say whether it does or not... because the functions are missing I am not sure if the speed is improved or not. Probably depends on the GPU that is available. |
I've reopened this PR after noticing that @ggerganov addressed a similar issue in #3524. Instead of introducing a new kernel, he chose to disable the buggy kernels on older devices as a fix. We have two choices moving forward:
if ([ctx->device supportsFamily:MTLGPUFamilyApple7]) {
GGML_METAL_ADD_KERNEL(mul_mm_f32_f32);
GGML_METAL_ADD_KERNEL(mul_mm_f16_f32);
GGML_METAL_ADD_KERNEL(mul_mm_q4_0_f32);
GGML_METAL_ADD_KERNEL(mul_mm_q8_0_f32);
GGML_METAL_ADD_KERNEL(mul_mm_q4_1_f32);
GGML_METAL_ADD_KERNEL(mul_mm_q2_K_f32);
GGML_METAL_ADD_KERNEL(mul_mm_q3_K_f32);
GGML_METAL_ADD_KERNEL(mul_mm_q4_K_f32);
GGML_METAL_ADD_KERNEL(mul_mm_q5_K_f32);
GGML_METAL_ADD_KERNEL(mul_mm_q6_K_f32);
} |
@ggerganov any idea when the next GGML sync will be? thank you :) |
Hopefully today or tomorrow |
Yup, let's try to fix this as part of #1422 |
Tempfix for issue detailed here #1292