Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load llama model #702

Closed
horacex opened this issue Mar 31, 2023 · 6 comments
Closed

Failed to load llama model #702

horacex opened this issue Mar 31, 2023 · 6 comments

Comments

@horacex
Copy link

horacex commented Mar 31, 2023

Hi,

I followed the instruction here to create this ggml-model-q4_0.bin file.
https://github.com/ggerganov/llama.cpp

Then try to run the talk-llama with following command:

./talk-llama -mw ./models/ggml-model-whisper-base.en.bin -ml ./models/ggml-model-q4_0.bin -p “Myname” -t 8

why got the following feedback:

whisper_init_from_file_no_state: loading model from './models/ggml-model-whisper-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  218.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
llama_model_load: loading model from './models/ggml-model-q4_0.bin' - please wait ...
llama_model_load: invalid model file './models/ggml-model-q4_0.bin' (bad magic)
llama_init_from_file: failed to load model

main: processing, 8 threads, lang = en, task = transcribe, timestamps = 0 ...

init: found 2 capture devices:
init:    - Capture device #0: 'MacBook Pro Microphone'
init:    - Capture device #1: 'Microsoft Teams Audio'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init:     - sample rate:       16000
init:     - format:            33056 (required: 33056)
init:     - channels:          1 (required: 1)
init:     - samples per frame: 1024
zsh: segmentation fault  ./talk-llama -mw ./models/ggml-model-whisper-base.en.bin -ml  -p  -t 8

Obviously the whisper model launched successfully. but the llama model didn't. Not sure what I did wrong.
I am 100% sure the model file folder path is correct.

@JKeddo95
Copy link

I am facing the same issue. I will experiment replacing the llama files here with ones from the latest llama repo to see if that makes a difference.

@edwios
Copy link

edwios commented Mar 31, 2023

The reason I believe is due to the ggml format has changed in llama.cpp, see ggerganov/llama.cpp#613. The changes have not back ported to whisper.cpp yet. So to use talk-llama, after you have replaced the llama.cpp and llama.h, ggml.c and ggml.h files, the whisper weights e.g. sgml-small.en.bin must then also need to be changed to the new format. I have tried to use the script migrate-ggml-2023-03-30-pr613.py (from the llama.cpp repo) to convert it to the new format but the script threw errors. I haven't try to convert directly form the .pth files though, maybe it would work...

@Shawn9512
Copy link

I am facing the same issue. I will experiment replacing the llama files here with ones from the latest llama repo to see if that makes a difference.

Did it work? I've redownloaded the model, but the issue still persists.

@horacex
Copy link
Author

horacex commented Apr 1, 2023

The reason I believe is due to the ggml format has changed in llama.cpp, see ggerganov/llama.cpp#613. The changes have not back ported to whisper.cpp yet. So to use talk-llama, after you have replaced the llama.cpp and llama.h, ggml.c and ggml.h files, the whisper weights e.g. sgml-small.en.bin must then also need to be changed to the new format. I have tried to use the script migrate-ggml-2023-03-30-pr613.py (from the llama.cpp repo) to convert it to the new format but the script threw errors. I haven't try to convert directly form the .pth files though, maybe it would work...

you are correct. i used all the above source code to update the whisper version and re compile everything. it worked.

@mab122
Copy link
Contributor

mab122 commented Apr 1, 2023

If someone is trying to just get this working I managed to do it.
My understanding is that whisper.cpp uses older ggml format/style. If you replace llama.cpp/.h and ggml.cpp/h to ones from llama.cpp it compiles correctly and loads llama model, however it doesn't load whisper models then as they are older ggml files.

So don't do that and use convert script from #324 to convert ggml-model-q4_0.bin (not ...q4_1.bin as it is newer ggml format) and it worked for me. Note that I am trying to use alpaca model here (as linked)
(if someone wants it - here is ipfs link /ipfs/QmUqDCPxZj6KrCgcotKnuEPBTiDz8ixikkBoFt3sPD395B)

(Seems like 30B variant fails with: llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file)

@ggerganov
Copy link
Owner

This should work now and the performance is much better compared to what we had before.
Update to latest master - both whisper.cpp and llama.cpp.
Make sure to use the latest LLaMA models create as described in the llama.cpp repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants