ggml : optimize llamafile's cpu matrix multiplication for ppc64le #10156

amritahs-ibm · 2024-11-04T05:05:03Z

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…ng MMA This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for FP32 datatype. This change results in a consistent 90% improvement in input processing time, and 20% to 80% improvement in output processing time, across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

anjiltech · 2024-11-08T18:20:34Z

hi @ggerganov Can you please help reviewing this PR and suggest any missing actions required from committer to get it to review.

…ganov#10156) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for FP32 datatype. This change results in a consistent 90% improvement in input processing time, and 20% to 80% improvement in output processing time, across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

ALutz273 · 2025-01-23T22:07:58Z

+1

ggerganov approved these changes Nov 9, 2024

View reviewed changes

ggerganov merged commit e892134 into ggerganov:master Nov 9, 2024
52 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : optimize llamafile's cpu matrix multiplication for ppc64le #10156

ggml : optimize llamafile's cpu matrix multiplication for ppc64le #10156

amritahs-ibm commented Nov 4, 2024

anjiltech commented Nov 8, 2024

ALutz273 commented Jan 23, 2025

ggml : optimize llamafile's cpu matrix multiplication for ppc64le #10156

ggml : optimize llamafile's cpu matrix multiplication for ppc64le #10156

Conversation

amritahs-ibm commented Nov 4, 2024

anjiltech commented Nov 8, 2024

ALutz273 commented Jan 23, 2025