Skip to content

llama.cpp allocates a way more ram than ollama #9414

Closed Answered by 0cc4m
commonuserlol asked this question in Q&A
Discussion options

You must be logged in to vote

You might want to use a K-quant (Q3_K or Q4_K) for now. IQ quants are not yet supported in Vulkan and probably fall back to CPU. IQ2 and IQ3 support is being worked on (#11360), but it will take a little time until they are optimized similar to the other quants.

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@commonuserlol
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by commonuserlol
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants