-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-quantization of a split gguf file produces "invalid split file" #6548
Comments
From a newbie perspective, It appears that LLM_KV_SPLIT_COUNT is retaining the value from when it was split. In this instance LLM_KV_SPLIT_COUNT is clearly returning as greater than 1. We then see https://github.com/ggerganov/llama.cpp/blob/cc4a95426d17417d3c83f12bdb514fbe8abe2a88/llama.cpp#L2954 is checking that the postfix (end of the file ) is named in the following format "-%05d-of-%05d.gguf". Which since the quantization occurred the file will no longer be named as such. (command-r-plus-104b-Q2_K_S.gguf) I endeavour to look into why LLM_KV_SPLIT_COUNT has retained its count at a later time. |
Yes I see 2 solutions:
|
Probably a duplicate of : |
@phymbert I'm working on it. Is It better to support both?
|
how do i combine the shards? |
You can use |
The fix has been merged here #6591 |
Thanks; closing the issue as fixed then. 👍 |
No reopening because I think the target should be a split version after quantize |
OK, no problem. I thought of that solution more as a new feature, while this issue was more about resolving the bug (producing invalid files). As for the split during quantization: I would consider that most of the splits are currently done only to fit shards into the 50 GB huggingface upload limit – and after quantization, it is likely that a lot of the time the output will already fit in the single file limit. So I would argue the default behavior should be no splitting after quantization, since a) the split is probably unnecessary, or b) the user will probably want to use a different number of shards anyway. |
@he29-net Create another PR to generate "a split version after quantize". It is optional so it won't affect default behavior. |
Hi, while testing #6491 branch, I downloaded a Q8_0 quant (split into 3 files) from
dranger003
, and re-quantized it to Q2_K_S to make it more digestible for my museum hardware:I only passed the first piece, but
./quantize
processed it correctly and produced a single file with the expected size. However, it probably did not update some metadata and./main
still thinks the result is a split file:As a workaround, it is possible to "reset" the metadata by doing a "dummy pass" of
gguf-split
:The resulting file then seems to be working fine.
It's probably an easy fix, but after a quick grep through the source and a look at
quantize.cpp
I figured I don't even know where to start, so it would be probably much easier and faster done by someone who knows the code-base.The text was updated successfully, but these errors were encountered: