llama : only use default buffer types for the KV cache #10358

slaren · 2024-11-17T10:17:38Z

It's not easy to test a single op to determine KV buffer compatibility, so instead we use only the device default buffer type for the KV.

Fixes #10351

llama : only use default buffer types for the KV cache

d2750fd

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 17, 2024

ggerganov approved these changes Nov 17, 2024

View reviewed changes

slaren merged commit be5cacc into master Nov 17, 2024
54 checks passed

slaren deleted the sl/fix-kv-buft branch November 17, 2024 11:25

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

llama : only use default buffer types for the KV cache (ggerganov#10358)

e0e284e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : only use default buffer types for the KV cache #10358

llama : only use default buffer types for the KV cache #10358

slaren commented Nov 17, 2024 •

edited

Loading

llama : only use default buffer types for the KV cache #10358

llama : only use default buffer types for the KV cache #10358

Conversation

slaren commented Nov 17, 2024 • edited Loading

slaren commented Nov 17, 2024 •

edited

Loading