Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove split metadata when quantize model shards #6591

Merged
merged 4 commits into from
Apr 12, 2024

Conversation

zj040045
Copy link
Contributor

  • Add gguf_remove_key to remove key from gguf_remove_key
  • Remove split metadata when quantizing.

ggml.c Outdated
Comment on lines 21155 to 21156
for (int i = idx; i < n_kv; ++i)
ctx->kv[i] = ctx->kv[i+1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this loop be up to n_kv-1? The body of the loop should also be in brackets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

llama.cpp Outdated
@@ -13514,6 +13514,10 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
gguf_set_kv (ctx_out, ml.meta);
gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION);
gguf_set_val_u32(ctx_out, "general.file_type", ftype);
// Remove split metadata
gguf_remove_key(ctx_out, "split.no");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is constant for that keys: LLM_KV_SPLIT*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This comment was marked as off-topic.

@phymbert
Copy link
Collaborator

Copy link
Collaborator

@phymbert phymbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, althought I believe the right approach should be to generate split in quantize if the input models is splitted.
It can be done later on.
Please merge after @ggerganov approval

@ggerganov ggerganov merged commit 91c7360 into ggml-org:master Apr 12, 2024
56 of 60 checks passed
@zj040045
Copy link
Contributor Author

zj040045 commented Apr 12, 2024

Thanks, althought I believe the right approach should be to generate split in quantize if the input models is splitted.

@phymbert I am checking it. Is it good to add "--split-max-*" for quantize re-split models? Or it only needs a option like "--keep-split" to generate the same number of splits?

@phymbert
Copy link
Collaborator

phymbert commented Apr 12, 2024

Is it good to add "--split-max-*" for quantize re-split models? Or it only needs a option like "--keep-split" to generate the same number of splits?

Yes, let's keep it simple at the moment, with the same distribution of tensors per file as the original

tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
…-org#6591)

* Remove split metadata when quantize model shards

* Find metadata key by enum

* Correct loop range for gguf_remove_key and code format

* Free kv memory

---------

Co-authored-by: z5269887 <z5269887@unsw.edu.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants