Re-quantization of a split gguf file produces "invalid split file" #6548

he29-net · 2024-04-08T17:09:20Z

Hi, while testing #6491 branch, I downloaded a Q8_0 quant (split into 3 files) from dranger003, and re-quantized it to Q2_K_S to make it more digestible for my museum hardware:

./quantize --allow-requantize --imatrix ../models/ggml-c4ai-command-r-plus-104b-f16-imatrix.dat ../models/ggml-c4ai-command-r-plus-104b-q8_0-00001-of-00003.gguf ../models/command-r-plus-104b-Q2_K_S.gguf Q2_K_S 2

I only passed the first piece, but ./quantize processed it correctly and produced a single file with the expected size. However, it probably did not update some metadata and ./main still thinks the result is a split file:

./main -m ../models/command-r-plus-104b-Q2_K_S.gguf -t 15 --color -p "this is a test" -c 2048 -ngl 25 -ctk q8_0
...
llama_model_load: error loading model: invalid split file: ../models/command-r-plus-104b-Q2_K_S.gguf
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '../models/command-r-plus-104b-Q2_K_S.gguf'
main: error: unable to load model

As a workaround, it is possible to "reset" the metadata by doing a "dummy pass" of gguf-split:

./gguf-split --split-max-tensors 999 --split ../models/command-r-plus-104b-Q2_K_S.gguf ../models/command-r-plus-104b-Q2_K_S.gguf.split

The resulting file then seems to be working fine.

It's probably an easy fix, but after a quick grep through the source and a look at quantize.cpp I figured I don't even know where to start, so it would be probably much easier and faster done by someone who knows the code-base.

The text was updated successfully, but these errors were encountered:

AlexsCode · 2024-04-08T21:59:48Z

From a newbie perspective, It appears that LLM_KV_SPLIT_COUNT is retaining the value from when it was split.

https://github.com/ggerganov/llama.cpp/blob/cc4a95426d17417d3c83f12bdb514fbe8abe2a88/llama.cpp#L2942-L2956

In this instance LLM_KV_SPLIT_COUNT is clearly returning as greater than 1.

We then see https://github.com/ggerganov/llama.cpp/blob/cc4a95426d17417d3c83f12bdb514fbe8abe2a88/llama.cpp#L2954 is checking that the postfix (end of the file ) is named in the following format "-%05d-of-%05d.gguf". Which since the quantization occurred the file will no longer be named as such. (command-r-plus-104b-Q2_K_S.gguf)

I endeavour to look into why LLM_KV_SPLIT_COUNT has retained its count at a later time.

phymbert · 2024-04-08T22:05:22Z

Yes I see 2 solutions:

quantize should also generate shards if the model is loaded from a split
quantize must remove split metadata

phymbert · 2024-04-09T08:28:30Z

Probably a duplicate of :

split: include the option in ./convert.py and quantize #6260

zj040045 · 2024-04-10T03:04:03Z

@phymbert I'm working on it. Is It better to support both?

for models from splits to single file, quantize remove split metadata
for models from splits to splits, quantize generates shards and everything works out of box

4cecoder · 2024-04-12T00:09:56Z

how do i combine the shards?

phymbert · 2024-04-12T06:45:05Z

how do i combine the shards?

You can use --merge operation but it is not necessary anymore as of now loading model from shards is built-in.

zj040045 · 2024-04-12T13:58:30Z

The fix has been merged here #6591

he29-net · 2024-04-12T14:37:24Z

Thanks; closing the issue as fixed then. 👍

phymbert · 2024-04-12T14:46:16Z

No reopening because I think the target should be a split version after quantize

he29-net · 2024-04-12T15:03:11Z

OK, no problem. I thought of that solution more as a new feature, while this issue was more about resolving the bug (producing invalid files).

As for the split during quantization: I would consider that most of the splits are currently done only to fit shards into the 50 GB huggingface upload limit – and after quantization, it is likely that a lot of the time the output will already fit in the single file limit. So I would argue the default behavior should be no splitting after quantization, since a) the split is probably unnecessary, or b) the user will probably want to use a different number of shards anyway.

zj040045 · 2024-04-15T14:40:31Z

@he29-net Create another PR to generate "a split version after quantize". It is optional so it won't affect default behavior.

he29-net added the bug-unconfirmed label Apr 8, 2024

ggerganov added bug Something isn't working good first issue Good for newcomers and removed bug-unconfirmed labels Apr 8, 2024

phymbert added the split GGUF split model sharding label Apr 8, 2024

phymbert mentioned this issue Apr 10, 2024

Remove split metadata when quantize model shards #6591

Merged

he29-net closed this as completed Apr 12, 2024

phymbert reopened this Apr 12, 2024

zj040045 mentioned this issue Apr 15, 2024

Implement '--keep-split' to quantize model into several shards #6688

Merged

ggerganov closed this as completed in #6688 Apr 25, 2024

christianazinn mentioned this issue Apr 26, 2024

split: include the option in ./convert.py and quantize #6260

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-quantization of a split gguf file produces "invalid split file" #6548

Re-quantization of a split gguf file produces "invalid split file" #6548

he29-net commented Apr 8, 2024

AlexsCode commented Apr 8, 2024

phymbert commented Apr 8, 2024

phymbert commented Apr 9, 2024

zj040045 commented Apr 10, 2024

4cecoder commented Apr 12, 2024

phymbert commented Apr 12, 2024

zj040045 commented Apr 12, 2024

he29-net commented Apr 12, 2024

phymbert commented Apr 12, 2024

he29-net commented Apr 12, 2024

zj040045 commented Apr 15, 2024

Re-quantization of a split gguf file produces "invalid split file" #6548

Re-quantization of a split gguf file produces "invalid split file" #6548

Comments

he29-net commented Apr 8, 2024

AlexsCode commented Apr 8, 2024

phymbert commented Apr 8, 2024

phymbert commented Apr 9, 2024

zj040045 commented Apr 10, 2024

4cecoder commented Apr 12, 2024

phymbert commented Apr 12, 2024

zj040045 commented Apr 12, 2024

he29-net commented Apr 12, 2024

phymbert commented Apr 12, 2024

he29-net commented Apr 12, 2024

zj040045 commented Apr 15, 2024