Failed to load llama model #702

horacex · 2023-03-31T13:15:01Z

Hi,

I followed the instruction here to create this ggml-model-q4_0.bin file.
https://github.com/ggerganov/llama.cpp

Then try to run the talk-llama with following command:

./talk-llama -mw ./models/ggml-model-whisper-base.en.bin -ml ./models/ggml-model-q4_0.bin -p “Myname” -t 8

why got the following feedback:

whisper_init_from_file_no_state: loading model from './models/ggml-model-whisper-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  218.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
llama_model_load: loading model from './models/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file './models/ggml-model-q4_0.bin' (bad magic)
llama_init_from_file: failed to load model

main: processing, 8 threads, lang = en, task = transcribe, timestamps = 0 ...

init: found 2 capture devices:
init:    - Capture device #0: 'MacBook Pro Microphone'
init:    - Capture device #1: 'Microsoft Teams Audio'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init:     - sample rate:       16000
init:     - format:            33056 (required: 33056)
init:     - channels:          1 (required: 1)
init:     - samples per frame: 1024
zsh: segmentation fault  ./talk-llama -mw ./models/ggml-model-whisper-base.en.bin -ml  -p  -t 8

Obviously the whisper model launched successfully. but the llama model didn't. Not sure what I did wrong.
I am 100% sure the model file folder path is correct.

The text was updated successfully, but these errors were encountered:

JKeddo95 · 2023-03-31T18:54:48Z

I am facing the same issue. I will experiment replacing the llama files here with ones from the latest llama repo to see if that makes a difference.

edwios · 2023-03-31T21:40:58Z

The reason I believe is due to the ggml format has changed in llama.cpp, see ggerganov/llama.cpp#613. The changes have not back ported to whisper.cpp yet. So to use talk-llama, after you have replaced the llama.cpp and llama.h, ggml.c and ggml.h files, the whisper weights e.g. sgml-small.en.bin must then also need to be changed to the new format. I have tried to use the script migrate-ggml-2023-03-30-pr613.py (from the llama.cpp repo) to convert it to the new format but the script threw errors. I haven't try to convert directly form the .pth files though, maybe it would work...

Shawn9512 · 2023-04-01T13:30:25Z

I am facing the same issue. I will experiment replacing the llama files here with ones from the latest llama repo to see if that makes a difference.

Did it work? I've redownloaded the model, but the issue still persists.

horacex · 2023-04-01T14:04:31Z

The reason I believe is due to the ggml format has changed in llama.cpp, see ggerganov/llama.cpp#613. The changes have not back ported to whisper.cpp yet. So to use talk-llama, after you have replaced the llama.cpp and llama.h, ggml.c and ggml.h files, the whisper weights e.g. sgml-small.en.bin must then also need to be changed to the new format. I have tried to use the script migrate-ggml-2023-03-30-pr613.py (from the llama.cpp repo) to convert it to the new format but the script threw errors. I haven't try to convert directly form the .pth files though, maybe it would work...

you are correct. i used all the above source code to update the whisper version and re compile everything. it worked.

mab122 · 2023-04-01T17:11:49Z

If someone is trying to just get this working I managed to do it.
My understanding is that whisper.cpp uses older ggml format/style. If you replace llama.cpp/.h and ggml.cpp/h to ones from llama.cpp it compiles correctly and loads llama model, however it doesn't load whisper models then as they are older ggml files.

So don't do that and use convert script from #324 to convert ggml-model-q4_0.bin (not ...q4_1.bin as it is newer ggml format) and it worked for me. Note that I am trying to use alpaca model here (as linked)
(if someone wants it - here is ipfs link /ipfs/QmUqDCPxZj6KrCgcotKnuEPBTiDz8ixikkBoFt3sPD395B)

(Seems like 30B variant fails with: llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file)

ggerganov · 2023-04-10T20:02:38Z

This should work now and the performance is much better compared to what we had before.
Update to latest master - both whisper.cpp and llama.cpp.
Make sure to use the latest LLaMA models create as described in the llama.cpp repo

edwios mentioned this issue Apr 2, 2023

Backport the performance improvement from llama.cpp #709

Closed

bootkernel mentioned this issue Apr 7, 2023

Error, when tried to run talk-llama with gpt4all model, since I can not get llama model #726

Closed

ggerganov closed this as completed Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to load llama model #702

Failed to load llama model #702

horacex commented Mar 31, 2023

JKeddo95 commented Mar 31, 2023

edwios commented Mar 31, 2023 •

edited

Loading

Shawn9512 commented Apr 1, 2023

horacex commented Apr 1, 2023

mab122 commented Apr 1, 2023 •

edited

Loading

ggerganov commented Apr 10, 2023

Failed to load llama model #702

Failed to load llama model #702

Comments

horacex commented Mar 31, 2023

JKeddo95 commented Mar 31, 2023

edwios commented Mar 31, 2023 • edited Loading

Shawn9512 commented Apr 1, 2023

horacex commented Apr 1, 2023

mab122 commented Apr 1, 2023 • edited Loading

ggerganov commented Apr 10, 2023

edwios commented Mar 31, 2023 •

edited

Loading

mab122 commented Apr 1, 2023 •

edited

Loading