Remove split metadata when quantize model shards #6591

zj040045 · 2024-04-10T16:44:51Z

Add gguf_remove_key to remove key from gguf_remove_key
Remove split metadata when quantizing.

slaren · 2024-04-10T16:55:45Z

ggml.c

+        for (int i = idx; i < n_kv; ++i)
+            ctx->kv[i] = ctx->kv[i+1];


Shouldn't this loop be up to n_kv-1? The body of the loop should also be in brackets.

ggml.c

phymbert · 2024-04-10T17:12:26Z

llama.cpp

@@ -13514,6 +13514,10 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
    gguf_set_kv     (ctx_out, ml.meta);
    gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION);
    gguf_set_val_u32(ctx_out, "general.file_type", ftype);
+    // Remove split metadata
+    gguf_remove_key(ctx_out, "split.no");


There is constant for that keys: LLM_KV_SPLIT*

phymbert · 2024-04-10T17:14:39Z

Reference:

Re-quantization of a split gguf file produces "invalid split file" #6548

phymbert

Thanks, althought I believe the right approach should be to generate split in quantize if the input models is splitted.
It can be done later on.
Please merge after @ggerganov approval

zj040045 · 2024-04-12T13:58:05Z

Thanks, althought I believe the right approach should be to generate split in quantize if the input models is splitted.

@phymbert I am checking it. Is it good to add "--split-max-*" for quantize re-split models? Or it only needs a option like "--keep-split" to generate the same number of splits?

phymbert · 2024-04-12T14:07:13Z

Is it good to add "--split-max-*" for quantize re-split models? Or it only needs a option like "--keep-split" to generate the same number of splits?

Yes, let's keep it simple at the moment, with the same distribution of tensors per file as the original

…-org#6591) * Remove split metadata when quantize model shards * Find metadata key by enum * Correct loop range for gguf_remove_key and code format * Free kv memory --------- Co-authored-by: z5269887 <z5269887@unsw.edu.au>

Remove split metadata when quantize model shards

502d069

slaren reviewed Apr 10, 2024

View reviewed changes

ggml.c Show resolved Hide resolved

phymbert requested changes Apr 10, 2024

View reviewed changes

This comment was marked as off-topic.

Sign in to view

z5269887 added 3 commits April 11, 2024 21:25

Find metadata key by enum

29ed5d6

Correct loop range for gguf_remove_key and code format

ad0710a

Free kv memory

fa908c0

slaren approved these changes Apr 12, 2024

View reviewed changes

phymbert approved these changes Apr 12, 2024

View reviewed changes

ggerganov approved these changes Apr 12, 2024

View reviewed changes

ggerganov merged commit 91c7360 into ggml-org:master Apr 12, 2024
56 of 60 checks passed

zj040045 mentioned this pull request Apr 12, 2024

Re-quantization of a split gguf file produces "invalid split file" #6548

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove split metadata when quantize model shards #6591

Remove split metadata when quantize model shards #6591

zj040045 commented Apr 10, 2024

slaren Apr 10, 2024

zj040045 Apr 11, 2024

phymbert Apr 10, 2024

zj040045 Apr 11, 2024

This comment was marked as off-topic.

phymbert commented Apr 10, 2024

phymbert left a comment

zj040045 commented Apr 12, 2024 •

edited

Loading

phymbert commented Apr 12, 2024 •

edited

Loading

Remove split metadata when quantize model shards #6591

Remove split metadata when quantize model shards #6591

Conversation

zj040045 commented Apr 10, 2024

slaren Apr 10, 2024

Choose a reason for hiding this comment

zj040045 Apr 11, 2024

Choose a reason for hiding this comment

phymbert Apr 10, 2024

Choose a reason for hiding this comment

zj040045 Apr 11, 2024

Choose a reason for hiding this comment

This comment was marked as off-topic.

phymbert commented Apr 10, 2024

phymbert left a comment

Choose a reason for hiding this comment

zj040045 commented Apr 12, 2024 • edited Loading

phymbert commented Apr 12, 2024 • edited Loading

zj040045 commented Apr 12, 2024 •

edited

Loading

phymbert commented Apr 12, 2024 •

edited

Loading