-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove split metadata when quantize model shards #6591
Conversation
ggml.c
Outdated
for (int i = idx; i < n_kv; ++i) | ||
ctx->kv[i] = ctx->kv[i+1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this loop be up to n_kv-1
? The body of the loop should also be in brackets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
llama.cpp
Outdated
@@ -13514,6 +13514,10 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s | |||
gguf_set_kv (ctx_out, ml.meta); | |||
gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION); | |||
gguf_set_val_u32(ctx_out, "general.file_type", ftype); | |||
// Remove split metadata | |||
gguf_remove_key(ctx_out, "split.no"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is constant for that keys: LLM_KV_SPLIT*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, althought I believe the right approach should be to generate split in quantize
if the input models is splitted.
It can be done later on.
Please merge after @ggerganov approval
@phymbert I am checking it. Is it good to add "--split-max-*" for |
Yes, let's keep it simple at the moment, with the same distribution of tensors per file as the original |
…-org#6591) * Remove split metadata when quantize model shards * Find metadata key by enum * Correct loop range for gguf_remove_key and code format * Free kv memory --------- Co-authored-by: z5269887 <z5269887@unsw.edu.au>
gguf_remove_key
to remove key fromgguf_remove_key