Porting LeoLM instruct models to llama.cpp #3935

sorgfresser · 2023-11-03T16:08:45Z

sorgfresser
Nov 3, 2023

Hey,

I'm very impressed by the speed and ease at which llama.cpp can deploy many models. I tried converting a German & English only model named LeoLM but did only manage to get it to work for the non-instruct finetuned variants which seems a bit odd to me.
First of all, if I just try to convert the LeoLM/leo-hessianai-7b-chat (available on hf) I get

Exception: Vocab size mismatch (model has 32128, but models/leo-hessianai-7b-chat/tokenizer.model has 32000).

The vocab size of the instruction ones does actually exceed 32000 (there are some special tokens above id 31999 in the range [32000, 32006]) but by simply modifying the config.json to vocab_size: 32000 similar to #3900 I at least managed to get the conversion itself to run through.
But if I run ./main -m "models/leo-hessianai-7b-chat/ggml-model-f16.gguf" it fails with

error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  4096, 32000, got  4096, 32128,     1,     1

which is fair since I edited the vocab size from 32128 to 32000 but confuses me since I initially got the impression that the tokenizer only had 32000 tokens due to the error.

Is there any param I could pass to convert.py that I'm overlooking? All the ones I get using -h do not seem helpful to me.
Thanks a lot for any advice!

KerfuffleV2 · 2023-11-03T19:14:24Z

KerfuffleV2
Nov 3, 2023
Collaborator

I think what you might need to do is add pad tokens rather than trying to change the model's vocab size. You can try #3743 with the --padvocab option (make sure the model you're converting is back to its original state first).

2 replies

sorgfresser Nov 4, 2023
Author

Thanks a lot for the advice! I can verify that it works, even after quantization and once converted on your branch I can use it on latest llama.cpp master as well.

So i simply ran convert with --padvocab on your branch and there is one thing that was weird: we're still skipping the special tokens >= 32000, e.g.

gguf: WARNING: Special token type bos, id 32005 out of range, must be under 32000 - skipping
gguf: WARNING: Special token type eos, id 32006 out of range, must be under 32000 - skipping
gguf: WARNING: Special token type sep, id 32001 out of range, must be under 32000 - skipping
gguf: WARNING: Special token type pad, id 32004 out of range, must be under 32000 - skipping

and then afterwards

Padding vocab with 128 token(s) - <dummy00001> through <dummy00128>
This gguf file is for Little Endian only

Wouldn't it be cleaner to simply leave the tokens >= 32000 and only pad starting at 32005? Maybe I'm overlooking something here...

andrii-tropin Nov 19, 2023

Hello
I seem to be doing something wrong with the same LeoLM (leo-hessianai-7b). I converted it with the '--padvocab' option and quantized it:

python convert.py --padvocab model.bin
quantize ggml-model-f16.gguf ggml-model-q4_0.gguf q4_0

But when I run:

main -m ggml-model-q4_0.gguf -n 128

I get the following error:

llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.11 MiB
error loading model: create_tensor: tensor 'blk.0.attn_k.weight' has wrong shape; expected 4096, 4096, got 4096, 1024, 1, 1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'ggml-model-q4_0.gguf'
main: error: unable to load model

Can you help me where I'm making a mistake?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Porting LeoLM instruct models to llama.cpp #3935

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Porting LeoLM instruct models to llama.cpp #3935

sorgfresser Nov 3, 2023

Replies: 1 comment · 2 replies

KerfuffleV2 Nov 3, 2023 Collaborator

sorgfresser Nov 4, 2023 Author

andrii-tropin Nov 19, 2023

sorgfresser
Nov 3, 2023

Replies: 1 comment 2 replies

KerfuffleV2
Nov 3, 2023
Collaborator

sorgfresser Nov 4, 2023
Author