Breaking change of models since PR #252 #324

PriNova · 2023-03-20T12:48:57Z

After the PR #252, all base models need to be converted new.

For me, this is a big breaking change. The LoRa and/or Alpaca fine-tuned models are not compatible anymore.
Reconverting is not possible.

I see from the PR, that the tokenizer scores are written into the model.
Would it make sense to write the tokenizer scores into a seperate file to stay compatible with the (old) models?
The question then arrises, if

by loading the model the scoring file will be checked of existense and the sentencepiece tokenizer will be used, or
the user can decide which tokenizer to use.

What you think?

loretoparisi · 2023-03-20T13:20:47Z

I have the same issue, I cannot convert alpaca-lora models. I had to checkout previous commit then:

git checkout 5cb63e2493c49bc2c3b9b355696e8dc26cdd0380

eiz · 2023-03-20T13:23:51Z

Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want to overwrite your files)

loretoparisi · 2023-03-20T13:30:34Z

@eiz okay thanks, where I find the tokenizer file?

eiz · 2023-03-20T13:32:51Z

If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

Puncia · 2023-03-20T13:34:44Z

Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want to overwrite your files)

~~I'm calling python3 .\convert.py .\models\7B\ .\models\tokenizer.model from the llama directory but it doesn't seem to do anything. Doesn't even produce errors. Am I using it wrong?~~

EDIT: NEVERMIND I was just executing a blank .py file, I need to sleep. Thank you for your script

loretoparisi · 2023-03-20T13:36:10Z

confirmed it worked for both llama and alpaca 7B. 🥇

loretoparisi · 2023-03-20T13:39:02Z

Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want to overwrite your files)

I'm calling python3 .\convert.py .\models\7B\ .\models\tokenizer.model from the llama directory but it doesn't seem to do anything. Doesn't even produce errors. Am I using it wrong?

I did the same with no issue, tmp files were generated. For both 7B and 13B models it worked fine.

tjohnman · 2023-03-20T13:46:09Z

I can confirm this worked for 13B. Thank you!

I'm calling python3 .\convert.py .\models\7B\ .\models\tokenizer.model from the llama directory but it doesn't seem to do anything. Doesn't even produce errors. Am I using it wrong?

@loretoparisi Try python3 .\convert.py models\7B .\models\tokenizer.model

PriNova · 2023-03-20T13:49:04Z

If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

This works like charm. It would be amazing creating a PR for it.
It is a very import piece to have, if future changes will be made.

Green-Sky · 2023-03-20T13:50:45Z

@eiz can you make a pr with the script and name it something like convert_old-ggml_to_ggmlv1.py ?
maybe put it into a subfolder. eg whisper.cpp has a https://github.com/ggerganov/whisper.cpp/tree/master/extra extra folder with some scripts.

Green-Sky · 2023-03-20T13:51:50Z

if future changes will be made.

yes, there are discussions to change the fileformat again to make loading faster.

loretoparisi · 2023-03-20T13:52:37Z

@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size:

main: seed = 1679320340
llama_model_load: loading model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'

while the 7B works fine:

llama_model_load: loading model from './alpaca-models/7B/ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './alpaca-models/7B/ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

Green-Sky · 2023-03-20T13:54:37Z

@loretoparisi right now llama.cpp expects the 13b model to be 2 files. this is a known issue

rabidcopy · 2023-03-20T13:55:06Z

@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size:

main: seed = 1679320340
llama_model_load: loading model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'

That is due to llama.cpp expecting it to be in 2 parts. Alpaca.cpp hardcodes the n_parts to 1 to load the 13B weights. Change the 2 at https://github.com/ggerganov/llama.cpp/blob/master/main.cpp#L37 to a 1 and re-make. This will break loading normal 2-part 13B llama weights however. Edit: Confirmed myself using eiz's very useful conversion script. Loads the same as before with that change in main.cpp.

PriNova · 2023-03-20T13:55:26Z

@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size:

main: seed = 1679320340
llama_model_load: loading model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'

This is because in main.cpp the 13B model is assumed to have 2 parts, but it is a one part file (especially with the alpaca model).

@Green-Sky was faster with his reply

antimatter15 · 2023-03-20T16:24:58Z

Here's an updated torrent for the 7B

PriNova · 2023-03-20T18:13:05Z

Since the PR, the models (13B-Q4 SentencePiece) behaves strange.
With the --ins flag I got this conversation:

And with the -i flag it behaves like this:

Before the mentioned PR, this won't happen.

PriNova · 2023-03-21T02:21:46Z

I extracted the vocabulary from a (pre-PR reformatted) model as a json and from the tokenizer.model file from the original llama source for comparisons.
My observation is, both are equal in their tokenization.

So I ask myself, why to reformat the model if the tokenization vocabulary is already included?
Did I miss something?

eiz · 2023-03-21T06:53:06Z

the tokenizer.model contains scores for each token, most of which are just the negation of the token index (since they're output by the bpe trainer in descending order) so I initially thought that just the index could be used, but some of the frequency scores are actually modified (sequences of whitespace in particular). had to add the scores for this reason.

PriNova · 2023-03-21T10:37:13Z

the tokenizer.model contains scores for each token, most of which are just the negation of the token index (since they're output by the bpe trainer in descending order) so I initially thought that just the index could be used, but some of the frequency scores are actually modified (sequences of whitespace in particular). had to add the scores for this reason.

Yes, the -1000000000.0 score looks like a sentinel appearing first at id 259.
Before 259 are the custom_tokens.
And at every id with exclusively 2+ whitespaces included, the sentinel appears.
Everything else seems to be an offset by -259.

I guess the sentinel is used for decoding/encoding cleaning the text whitespaces and the others are used as is.

strfic · 2023-03-22T02:15:49Z

If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

Downloaded the tokenizer.model file. How to download the model from Git LFS now? Tried git lfs fetch and it says "Not in a Git repository"

shadowmint · 2023-03-23T05:38:39Z

For others who find this issue; the link in #324 (comment) is no longer correct after recent refactoring, you need to change the LLAMA_N_PARTS constant, wherever it is (currently in llama.cpp not main.cpp -> https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp#L17)

PriNova · 2023-03-23T08:42:48Z

And the user do not need to change the code for defining the right part count of the model.
With the option --n_parts can be set to the number of parts a model has.

DanielWicz · 2023-03-25T21:20:36Z

How to use convert.py when the model is in several parts ?

If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

Downloaded the tokenizer.model file. How to download the model from Git LFS now? Tried git lfs fetch and it says "Not in a Git repository"

git lfs install
git lfs pull

…xtensions-4.6.3 Bump typing-extensions from 4.6.2 to 4.6.3

PriNova changed the title ~~Break change of models since PR #252~~ Breaking change of models since PR #252 Mar 20, 2023

gjmulder added bug Something isn't working model Model specific labels Mar 20, 2023

nsarrazin mentioned this issue Mar 20, 2023

invalid model file './models/ggml-alpaca-7b-q4.bin' (too old, regenerate your model files!) #329

Closed

This was referenced Mar 21, 2023

Update IPFS links to quantized alpaca with new tokenizer format #352

Merged

Invalid model error : too old, regenerate your model files! #361

Closed

BadisG mentioned this issue Mar 21, 2023

The initial token is always empty. #367

Closed

This was referenced Mar 21, 2023

sha256 check sums to verify original and converted model data #338

Merged

SHA256 checksums correctness #374

Closed

ggerganov mentioned this issue Mar 22, 2023

Add proper instructions for using Alpaca models #382

Closed

classilla mentioned this issue Mar 22, 2023

Alpaca 7B faults on both macOS arm64 and Linux ppc64le #379

Closed

marcom mentioned this issue Mar 26, 2023

Add backwards-compatibility for older model format #526

Closed

mab122 mentioned this issue Apr 1, 2023

Failed to load llama model ggerganov/whisper.cpp#702

Closed

sw closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2023

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Merge pull request ggerganov#324 from abetlen/dependabot/pip/typing-e…

cc8858a

…xtensions-4.6.3 Bump typing-extensions from 4.6.2 to 4.6.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking change of models since PR #252 #324

Breaking change of models since PR #252 #324

PriNova commented Mar 20, 2023 •

edited

Loading

loretoparisi commented Mar 20, 2023

eiz commented Mar 20, 2023

loretoparisi commented Mar 20, 2023

eiz commented Mar 20, 2023

Puncia commented Mar 20, 2023 •

edited

Loading

loretoparisi commented Mar 20, 2023

loretoparisi commented Mar 20, 2023

tjohnman commented Mar 20, 2023 •

edited

Loading

PriNova commented Mar 20, 2023

Green-Sky commented Mar 20, 2023

Green-Sky commented Mar 20, 2023

loretoparisi commented Mar 20, 2023 •

edited

Loading

Green-Sky commented Mar 20, 2023

rabidcopy commented Mar 20, 2023 •

edited

Loading

PriNova commented Mar 20, 2023 •

edited

Loading

antimatter15 commented Mar 20, 2023

PriNova commented Mar 20, 2023

PriNova commented Mar 21, 2023 •

edited

Loading

eiz commented Mar 21, 2023 •

edited

Loading

PriNova commented Mar 21, 2023 •

edited

Loading

strfic commented Mar 22, 2023

shadowmint commented Mar 23, 2023

PriNova commented Mar 23, 2023 •

edited

Loading

DanielWicz commented Mar 25, 2023 •

edited

Loading

Breaking change of models since PR #252 #324

Breaking change of models since PR #252 #324

Comments

PriNova commented Mar 20, 2023 • edited Loading

loretoparisi commented Mar 20, 2023

eiz commented Mar 20, 2023

loretoparisi commented Mar 20, 2023

eiz commented Mar 20, 2023

Puncia commented Mar 20, 2023 • edited Loading

loretoparisi commented Mar 20, 2023

loretoparisi commented Mar 20, 2023

tjohnman commented Mar 20, 2023 • edited Loading

PriNova commented Mar 20, 2023

Green-Sky commented Mar 20, 2023

Green-Sky commented Mar 20, 2023

loretoparisi commented Mar 20, 2023 • edited Loading

Green-Sky commented Mar 20, 2023

rabidcopy commented Mar 20, 2023 • edited Loading

PriNova commented Mar 20, 2023 • edited Loading

antimatter15 commented Mar 20, 2023

PriNova commented Mar 20, 2023

PriNova commented Mar 21, 2023 • edited Loading

eiz commented Mar 21, 2023 • edited Loading

PriNova commented Mar 21, 2023 • edited Loading

strfic commented Mar 22, 2023

shadowmint commented Mar 23, 2023

PriNova commented Mar 23, 2023 • edited Loading

DanielWicz commented Mar 25, 2023 • edited Loading

PriNova commented Mar 20, 2023 •

edited

Loading

Puncia commented Mar 20, 2023 •

edited

Loading

tjohnman commented Mar 20, 2023 •

edited

Loading

loretoparisi commented Mar 20, 2023 •

edited

Loading

rabidcopy commented Mar 20, 2023 •

edited

Loading

PriNova commented Mar 20, 2023 •

edited

Loading

PriNova commented Mar 21, 2023 •

edited

Loading

eiz commented Mar 21, 2023 •

edited

Loading

PriNova commented Mar 21, 2023 •

edited

Loading

PriNova commented Mar 23, 2023 •

edited

Loading

DanielWicz commented Mar 25, 2023 •

edited

Loading