Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change of models since PR #252 #324

Closed
PriNova opened this issue Mar 20, 2023 · 24 comments
Closed

Breaking change of models since PR #252 #324

PriNova opened this issue Mar 20, 2023 · 24 comments
Labels
bug Something isn't working model Model specific

Comments

@PriNova
Copy link

PriNova commented Mar 20, 2023

After the PR #252, all base models need to be converted new.

For me, this is a big breaking change. The LoRa and/or Alpaca fine-tuned models are not compatible anymore.
Reconverting is not possible.

I see from the PR, that the tokenizer scores are written into the model.
Would it make sense to write the tokenizer scores into a seperate file to stay compatible with the (old) models?
The question then arrises, if

  1. by loading the model the scoring file will be checked of existense and the sentencepiece tokenizer will be used, or
  2. the user can decide which tokenizer to use.

What you think?

@PriNova PriNova changed the title Break change of models since PR #252 Breaking change of models since PR #252 Mar 20, 2023
@loretoparisi
Copy link

I have the same issue, I cannot convert alpaca-lora models. I had to checkout previous commit then:

git checkout 5cb63e2493c49bc2c3b9b355696e8dc26cdd0380

@eiz
Copy link
Contributor

eiz commented Mar 20, 2023

Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want to overwrite your files)

@loretoparisi
Copy link

@eiz okay thanks, where I find the tokenizer file?

@eiz
Copy link
Contributor

eiz commented Mar 20, 2023

If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

@Puncia
Copy link

Puncia commented Mar 20, 2023

Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want to overwrite your files)

I'm calling python3 .\convert.py .\models\7B\ .\models\tokenizer.model from the llama directory but it doesn't seem to do anything. Doesn't even produce errors. Am I using it wrong?

EDIT: NEVERMIND I was just executing a blank .py file, I need to sleep. Thank you for your script

@loretoparisi
Copy link

confirmed it worked for both llama and alpaca 7B. 🥇

@loretoparisi
Copy link

Can you try this convert script? https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 (it outputs .tmp files, you can uncomment the os.rename to do it in place if you want but I didn't want to overwrite your files)

I'm calling python3 .\convert.py .\models\7B\ .\models\tokenizer.model from the llama directory but it doesn't seem to do anything. Doesn't even produce errors. Am I using it wrong?

I did the same with no issue, tmp files were generated. For both 7B and 13B models it worked fine.

@tjohnman
Copy link
Contributor

tjohnman commented Mar 20, 2023

I can confirm this worked for 13B. Thank you!

I'm calling python3 .\convert.py .\models\7B\ .\models\tokenizer.model from the llama directory but it doesn't seem to do anything. Doesn't even produce errors. Am I using it wrong?

@loretoparisi Try python3 .\convert.py models\7B .\models\tokenizer.model

@PriNova
Copy link
Author

PriNova commented Mar 20, 2023

If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

This works like charm. It would be amazing creating a PR for it.
It is a very import piece to have, if future changes will be made.

@Green-Sky
Copy link
Collaborator

@eiz can you make a pr with the script and name it something like convert_old-ggml_to_ggmlv1.py ?
maybe put it into a subfolder. eg whisper.cpp has a https://github.com/ggerganov/whisper.cpp/tree/master/extra extra folder with some scripts.

@Green-Sky
Copy link
Collaborator

if future changes will be made.

yes, there are discussions to change the fileformat again to make loading faster.

@loretoparisi
Copy link

loretoparisi commented Mar 20, 2023

@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size:

main: seed = 1679320340
llama_model_load: loading model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'

while the 7B works fine:

llama_model_load: loading model from './alpaca-models/7B/ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './alpaca-models/7B/ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

@Green-Sky
Copy link
Collaborator

@loretoparisi right now llama.cpp expects the 13b model to be 2 files. this is a known issue

@rabidcopy
Copy link
Contributor

rabidcopy commented Mar 20, 2023

@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size:

main: seed = 1679320340
llama_model_load: loading model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'

That is due to llama.cpp expecting it to be in 2 parts. Alpaca.cpp hardcodes the n_parts to 1 to load the 13B weights. Change the 2 at https://github.com/ggerganov/llama.cpp/blob/master/main.cpp#L37 to a 1 and re-make. This will break loading normal 2-part 13B llama weights however. Edit: Confirmed myself using eiz's very useful conversion script. Loads the same as before with that change in main.cpp.

@PriNova
Copy link
Author

PriNova commented Mar 20, 2023

@eiz It seems there is a problem with the alpaca 13B, after conversion, when loading it complains about the embedding size:

main: seed = 1679320340
llama_model_load: loading model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './alpaca-models/13B/ggml-alpaca-13b-q4.bin'

This is because in main.cpp the 13B model is assumed to have 2 parts, but it is a one part file (especially with the alpaca model).

@Green-Sky was faster with his reply

@antimatter15
Copy link
Contributor

Here's an updated torrent for the 7B

@PriNova
Copy link
Author

PriNova commented Mar 20, 2023

Since the PR, the models (13B-Q4 SentencePiece) behaves strange.
With the --ins flag I got this conversation:

Screenshot 2023-03-20 181011

And with the -i flag it behaves like this:

Screenshot 2023-03-20 181051

Before the mentioned PR, this won't happen.

@PriNova
Copy link
Author

PriNova commented Mar 21, 2023

I extracted the vocabulary from a (pre-PR reformatted) model as a json and from the tokenizer.model file from the original llama source for comparisons.
My observation is, both are equal in their tokenization.

So I ask myself, why to reformat the model if the tokenization vocabulary is already included?
Did I miss something?

@eiz
Copy link
Contributor

eiz commented Mar 21, 2023

the tokenizer.model contains scores for each token, most of which are just the negation of the token index (since they're output by the bpe trainer in descending order) so I initially thought that just the index could be used, but some of the frequency scores are actually modified (sequences of whitespace in particular). had to add the scores for this reason.

@PriNova
Copy link
Author

PriNova commented Mar 21, 2023

the tokenizer.model contains scores for each token, most of which are just the negation of the token index (since they're output by the bpe trainer in descending order) so I initially thought that just the index could be used, but some of the frequency scores are actually modified (sequences of whitespace in particular). had to add the scores for this reason.

Yes, the -1000000000.0 score looks like a sentinel appearing first at id 259.
Before 259 are the custom_tokens.
And at every id with exclusively 2+ whitespaces included, the sentinel appears.
Everything else seems to be an offset by -259.

I guess the sentinel is used for decoding/encoding cleaning the text whitespaces and the others are used as is.

@strfic
Copy link

strfic commented Mar 22, 2023

If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

Downloaded the tokenizer.model file. How to download the model from Git LFS now? Tried git lfs fetch and it says "Not in a Git repository"

@shadowmint
Copy link

For others who find this issue; the link in #324 (comment) is no longer correct after recent refactoring, you need to change the LLAMA_N_PARTS constant, wherever it is (currently in llama.cpp not main.cpp -> https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp#L17)

@PriNova
Copy link
Author

PriNova commented Mar 23, 2023

And the user do not need to change the code for defining the right part count of the model.
With the option --n_parts can be set to the number of parts a model has.

@DanielWicz
Copy link

DanielWicz commented Mar 25, 2023

How to use convert.py when the model is in several parts ?

If you don't have access to the original LLaMA files I think someone uploaded it here https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

Downloaded the tokenizer.model file. How to download the model from Git LFS now? Tried git lfs fetch and it says "Not in a Git repository"

git lfs install
git lfs pull

@sw sw closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2023
Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023
…xtensions-4.6.3

Bump typing-extensions from 4.6.2 to 4.6.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working model Model specific
Projects
None yet
Development

No branches or pull requests