Converting kfkas Llama-2-ko-7b-Chat to GGUF fails #2865

kurugai · 2023-08-29T06:17:47Z

Hi. I'm trying to convert the 'kfkas/Llama-2-ko-7b-Chat' model I received from huggingface on Windows 11 into a gguf file.
So I tried to convert it to the command below.

C:\AI\llama.cpp>python convert-llama-hf-to-gguf.py .\models\kfkas_Llama-2-ko-7b-Chat 1

The conversion was successful, but when I tried to execute it, there was a problem that it couldn't be executed.

Can I ask you to review what should I do? Below are the results of the command execution.

I know you're busy, but please do it once.

C:\AI\llama.cpp>pip install gguf
Defaulting to user installation because normal site-packages is not writeable
Collecting gguf
Obtaining dependency information for gguf from https://files.pythonhosted.org/packages/bb/16/83a1cb95d9ec85bc316a1986481325c257a4a9a024e12bace801898db14e/gguf-0.2.1-py3-none-any.whl.metadata
Downloading gguf-0.2.1-py3-none-any.whl.metadata (1.9 kB)
Requirement already satisfied: numpy>=1.17 in c:\users\hwyoo\appdata\roaming\python\python310\site-packages (from gguf) (1.23.5)
Downloading gguf-0.2.1-py3-none-any.whl (8.1 kB)
Installing collected packages: gguf
Successfully installed gguf-0.2.1

C:\AI\llama.cpp>python convert-llama-hf-to-gguf.py .\models\kfkas_Llama-2-ko-7b-Chat 1
gguf: loading model kfkas_Llama-2-ko-7b-Chat
gguf: found 2 model parts
gguf: get model metadata
gguf: get tokenizer metadata
gguf: get special token ids
gguf: get tensor metadata
gguf: loading model part 'pytorch_model-00001-of-00002.bin'
token_embd.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.0.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.2.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.2.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.3.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.3.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.4.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.4.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.5.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.5.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.6.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.6.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.7.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.7.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.8.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.8.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.9.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.9.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.10.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.10.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.11.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.11.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.12.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.12.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.13.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.13.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.14.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.14.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.15.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.15.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.16.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.16.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.17.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.17.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.18.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.18.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.19.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.19.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.20.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.20.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.21.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.21.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.22.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.22.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.23.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
gguf: loading model part 'pytorch_model-00002-of-00002.bin'
blk.23.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.23.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.24.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.24.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.24.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.24.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.24.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.24.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.24.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.24.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.24.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.25.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.25.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.25.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.25.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.25.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.25.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.25.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.25.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.25.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.26.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.26.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.26.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.26.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.26.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.26.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.26.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.26.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.26.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.27.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.27.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.27.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.27.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.27.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.27.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.27.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.27.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.27.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.28.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.28.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.28.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.28.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.28.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.28.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.28.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.28.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.28.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.29.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.29.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.29.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.29.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.29.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.29.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.29.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.29.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.29.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.30.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.30.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.30.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.30.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.30.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.30.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.30.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.30.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.30.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.31.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.31.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.31.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.31.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.31.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.31.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.31.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.31.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.31.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
output_norm.weight, n_dims = 1, torch.float16 --> float32
output.weight, n_dims = 2, torch.float16 --> float16
gguf: write header
gguf: write metadata
gguf: write tensors
gguf: model successfully exported to '.\models\kfkas_Llama-2-ko-7b-Chat/ggml-model-f16.gguf'

C:\AI\llama.cpp>main
main: build = 1100 ( llama_model_loader: loaded llama_model_loader: - tensor 1: llama_model_loader: - tensor 2: llama_model_loader: - tensor 3: llama_model_loader: - tensor 4: llama_model_loader: - tensor 5: llama_model_loader: - tensor 6: llama_model_loader: - tensor 7: llama_model_loader: - tensor 8: llama_model_loader: - tensor 9: llama_model_loader: - tensor 10: llama_model_loader: - tensor 11: llama_model_loader: - tensor 12: llama_model_loader: - tensor 13: llama_model_loader: - tensor 14: llama_model_loader: - tensor 15: llama_model_loader: - tensor 16: llama_model_loader: - tensor 17: llama_model_loader: - tensor 18: llama_model_loader: - tensor 19: llama_model_loader: - tensor 20: llama_model_loader: - tensor 21: llama_model_loader: - tensor 22: llama_model_loader: - tensor 23: llama_model_loader: - tensor 24: llama_model_loader: - tensor 25: llama_model_loader: - tensor 26: llama_model_loader: - tensor 27: llama_model_loader: - tensor 28: llama_model_loader: - tensor 29: llama_model_loader: - tensor 30: llama_model_loader: - tensor 31: llama_model_loader: - tensor 32: llama_model_loader: - tensor 33: llama_model_loader: - tensor 34: llama_model_loader: - tensor 35: llama_model_loader: - tensor 36: llama_model_loader: - tensor 37: llama_model_loader: - tensor 38: llama_model_loader: - tensor 39: llama_model_loader: - tensor 40: llama_model_loader: - tensor 41: llama_model_loader: - tensor 42: llama_model_loader: - tensor 43: llama_model_loader: - tensor 44: llama_model_loader: - tensor 45: llama_model_loader: - tensor 46: llama_model_loader: - tensor 47: llama_model_loader: - tensor 48: llama_model_loader: - tensor 49: llama_model_loader: - tensor 50: llama_model_loader: - tensor 51: llama_model_loader: - tensor 52: llama_model_loader: - tensor 53: llama_model_loader: - tensor 54: llama_model_loader: - tensor 55: llama_model_loader: - tensor 56: llama_model_loader: - tensor 57: llama_model_loader: - tensor 58: llama_model_loader: - tensor 59: llama_model_loader: - tensor 60: llama_model_loader: - tensor 61: llama_model_loader: - tensor 62: llama_model_loader: - tensor 63: llama_model_loader: - tensor 64: llama_model_loader: - tensor 65: llama_model_loader: - tensor 66: llama_model_loader: - tensor 67: llama_model_loader: - tensor 68: llama_model_loader: - tensor 69: llama_model_loader: - tensor 70: llama_model_loader: - tensor 71: llama_model_loader: - tensor 72: llama_model_loader: - tensor 73: llama_model_loader: - tensor 74: llama_model_loader: - tensor 75: llama_model_loader: - tensor 76: llama_model_loader: - tensor 77: llama_model_loader: - tensor 78: llama_model_loader: - tensor 79: llama_model_loader: - tensor 80: llama_model_loader: - tensor 81: llama_model_loader: - tensor 82: llama_model_loader: - tensor 83: llama_model_loader: - tensor 84: llama_model_loader: - tensor 85: llama_model_loader: - tensor 86: llama_model_loader: - tensor 87: llama_model_loader: - tensor 88: llama_model_loader: - tensor 89: llama_model_loader: - tensor 90: llama_model_loader: - tensor 91: llama_model_loader: - tensor 92: llama_model_loader: - tensor 93: llama_model_loader: - tensor 94: llama_model_loader: - tensor 95: llama_model_loader: - tensor 96: llama_model_loader: - tensor 97: llama_model_loader: - tensor 98: llama_model_loader: - tensor 99: llama_model_loader: - tensor 100: llama_model_loader: - tensor 101: llama_model_loader: - tensor 102: llama_model_loader: - tensor 103: llama_model_loader: - tensor 104: llama_model_loader: - tensor 105: llama_model_loader: - tensor 106: llama_model_loader: - tensor 107: llama_model_loader: - tensor 108: llama_model_loader: - tensor 109: llama_model_loader: - tensor 110: llama_model_loader: - tensor 111: llama_model_loader: - tensor 112: llama_model_loader: - tensor 113: llama_model_loader: - tensor 114: llama_model_loader: - tensor 115: llama_model_loader: - tensor 116: llama_model_loader: - tensor 117: llama_model_loader: - tensor 118: llama_model_loader: - tensor 119: llama_model_loader: - tensor 120: llama_model_loader: - tensor 121: llama_model_loader: - tensor 122: llama_model_loader: - tensor 123: llama_model_loader: - tensor 124: llama_model_loader: - tensor 125: llama_model_loader: - tensor 126: llama_model_loader: - tensor 127: llama_model_loader: - tensor 128: llama_model_loader: - tensor 129: llama_model_loader: - tensor 130: llama_model_loader: - tensor 131: llama_model_loader: - tensor 132: llama_model_loader: - tensor 133: llama_model_loader: - tensor 134: llama_model_loader: - tensor 135: llama_model_loader: - tensor 136: llama_model_loader: - tensor 137: llama_model_loader: - tensor 138: llama_model_loader: - tensor 139: llama_model_loader: - tensor 140: llama_model_loader: - tensor 141: llama_model_loader: - tensor 142: llama_model_loader: - tensor 143: llama_model_loader: - tensor 144: llama_model_loader: - tensor 145: llama_model_loader: - tensor 146: llama_model_loader: - tensor 147: llama_model_loader: - tensor 148: llama_model_loader: - tensor 149: llama_model_loader: - tensor 150: llama_model_loader: - tensor 151: llama_model_loader: - tensor 152: llama_model_loader: - tensor 153: llama_model_loader: - tensor 154: llama_model_loader: - tensor 155: llama_model_loader: - tensor 156: llama_model_loader: - tensor 157: llama_model_loader: - tensor 158: llama_model_loader: - tensor 159: llama_model_loader: - tensor 160: llama_model_loader: - tensor 161: llama_model_loader: - tensor 162: llama_model_loader: - tensor 163: llama_model_loader: - tensor 164: llama_model_loader: - tensor 165: llama_model_loader: - tensor 166: llama_model_loader: - tensor 167: llama_model_loader: - tensor 168: llama_model_loader: - tensor 169: llama_model_loader: - tensor 170: llama_model_loader: - tensor 171: llama_model_loader: - tensor 172: llama_model_loader: - tensor 173: llama_model_loader: - tensor 174: llama_model_loader: - tensor 175: llama_model_loader: - tensor 176: llama_model_loader: - tensor 177: llama_model_loader: - tensor 178: llama_model_loader: - tensor 179: llama_model_loader: - tensor 180: llama_model_loader: - tensor 181: llama_model_loader: - tensor 182: llama_model_loader: - tensor 183: llama_model_loader: - tensor 184: llama_model_loader: - tensor 185: llama_model_loader: - tensor 186: llama_model_loader: - tensor 187: llama_model_loader: - tensor 188: llama_model_loader: - tensor 189: llama_model_loader: - tensor 190: llama_model_loader: - tensor 191: llama_model_loader: - tensor 192: llama_model_loader: - tensor 193: llama_model_loader: - tensor 194: llama_model_loader: - tensor 195: llama_model_loader: - tensor 196: llama_model_loader: - tensor 197: llama_model_loader: - tensor 198: llama_model_loader: - tensor 199: llama_model_loader: - tensor 200: llama_model_loader: - tensor 201: llama_model_loader: - tensor 202: llama_model_loader: - tensor 203: llama_model_loader: - tensor 204: llama_model_loader: - tensor 205: llama_model_loader: - tensor 206: llama_model_loader: - tensor 207: llama_model_loader: - tensor 208: llama_model_loader: - tensor 209: llama_model_loader: - tensor 210: llama_model_loader: - tensor 211: llama_model_loader: - tensor 212: llama_model_loader: - tensor 213: llama_model_loader: - tensor 214: llama_model_loader: - tensor 215: llama_model_loader: - tensor 216: llama_model_loader: - tensor 217: llama_model_loader: - tensor 218: llama_model_loader: - tensor 219: llama_model_loader: - tensor 220: llama_model_loader: - tensor 221: llama_model_loader: - tensor 222: llama_model_loader: - tensor 223: llama_model_loader: - tensor 224: llama_model_loader: - tensor 225: llama_model_loader: - tensor 226: llama_model_loader: - tensor 227: llama_model_loader: - tensor 228: llama_model_loader: - tensor 229: llama_model_loader: - tensor 230: llama_model_loader: - tensor 231: llama_model_loader: - tensor 232: llama_model_loader: - tensor 233: llama_model_loader: - tensor 234: llama_model_loader: - tensor 235: llama_model_loader: - tensor 236: llama_model_loader: - tensor 237: llama_model_loader: - tensor 238: llama_model_loader: - tensor 239: llama_model_loader: - tensor 240: llama_model_loader: - tensor 241: llama_model_loader: - tensor 242: llama_model_loader: - tensor 243: llama_model_loader: - tensor 244: llama_model_loader: - tensor 245: llama_model_loader: - tensor 246: llama_model_loader: - tensor 247: llama_model_loader: - tensor 248: llama_model_loader: - tensor 249: llama_model_loader: - tensor 250: llama_model_loader: - tensor 251: llama_model_loader: - tensor 252: llama_model_loader: - tensor 253: llama_model_loader: - tensor 254: llama_model_loader: - tensor 255: llama_model_loader: - tensor 256: llama_model_loader: - tensor 257: llama_model_loader: - tensor 258: llama_model_loader: - tensor 259: llama_model_loader: - tensor 260: llama_model_loader: - tensor 261: llama_model_loader: - tensor 262: llama_model_loader: - tensor 263: llama_model_loader: - tensor 264: llama_model_loader: - tensor 265: llama_model_loader: - tensor 266: llama_model_loader: - tensor 267: llama_model_loader: - tensor 268: llama_model_loader: - tensor 269: llama_model_loader: - tensor 270: llama_model_loader: - tensor 271: llama_model_loader: - tensor 272: llama_model_loader: - tensor 273: llama_model_loader: - tensor 274: llama_model_loader: - tensor 275: llama_model_loader: - tensor 276: llama_model_loader: - tensor 277: llama_model_loader: - tensor 278: llama_model_loader: - tensor 279: llama_model_loader: - tensor 280: llama_model_loader: - tensor 281: llama_model_loader: - tensor 282: llama_model_loader: - tensor 283: llama_model_loader: - tensor 284: llama_model_loader: - tensor 285: llama_model_loader: - tensor 286: llama_model_loader: - tensor 287: llama_model_loader: - tensor 288: llama_model_loader: - tensor 289: llama_model_loader: - tensor 290: llama_model_loader: - kv 0: llama_model_loader: - kv 1: llama_model_loader: - kv 2: llama_model_loader: - kv 3: llama_model_loader: - kv 4: llama_model_loader: - kv 5: llama_model_loader: - kv 6: llama_model_loader: - kv 7: llama_model_loader: - kv 8: llama_model_loader: - kv 9: llama_model_loader: - kv 10: llama_model_loader: - kv 11: llama_model_loader: - kv 12: llama_model_loader: - kv 13: llama_model_loader: - kv 14: llama_model_loader: - type f32: llama_model_loader: - type error loading model: key llama_load_model_from_file: llama_init_from_gpt_params: main: error: unable to load model data-hovercard-type="commit" data-hovercard-url="https://github.com/ggerganov/llama.cpp/commit/dd0dc366dab10e8df28d3924e7f313b5c695e908/hovercard" href="https://github.com/ggerganov/llama.cpp/commit/dd0dc366dab10e8df28d3924e7f313b5c695e908">dd0dc36)
meta data with 15 key-value pairs and 291 tensors from models/7B/ggml-model-f16.gguf (version GGUF V1L��.llama_model_loader: - tensor 0: token_embd.weight f16 [ 4096, 46336, 1, 1 ]
blk.0.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.0.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.0.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.0.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.0.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.0.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.0.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.1.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.1.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.1.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.1.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.1.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.1.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.1.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.2.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.2.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.2.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.2.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.2.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.2.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.2.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.3.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.3.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.3.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.3.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.3.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.3.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.3.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.4.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.4.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.4.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.4.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.4.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.4.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.4.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.5.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.5.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.5.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.5.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.5.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.5.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.5.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.6.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.6.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.6.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.6.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.6.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.6.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.6.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.7.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.7.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.7.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.7.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.7.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.7.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.7.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.8.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.8.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.8.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.8.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.8.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.8.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.8.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.9.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.9.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.9.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.9.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.9.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.9.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.9.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.10.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.10.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.10.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.10.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.10.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.10.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.10.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.11.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.11.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.11.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.11.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.11.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.11.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.11.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.12.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.12.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.12.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.12.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.12.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.12.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.12.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.13.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.13.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.13.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.13.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.13.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.13.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.13.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.14.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.14.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.14.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.14.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.14.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.14.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.14.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.15.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.15.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.15.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.15.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.15.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.15.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.15.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.16.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.16.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.16.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.16.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.16.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.16.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.16.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.17.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.17.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.17.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.17.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.17.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.17.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.17.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.18.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.18.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.18.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.18.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.18.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.18.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.18.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.19.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.19.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.19.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.19.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.19.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.19.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.19.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.20.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.20.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.20.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.20.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.20.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.20.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.20.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.21.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.21.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.21.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.21.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.21.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.21.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.21.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.22.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.22.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.22.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.22.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.22.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.22.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.22.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.23.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.23.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.23.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.23.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.23.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.23.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.23.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.24.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.24.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.24.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.24.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.24.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.24.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.24.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.25.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.25.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.25.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.25.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.25.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.25.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.25.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.26.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.26.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.26.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.26.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.26.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.26.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.26.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.27.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.27.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.27.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.27.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.27.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.27.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.27.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.28.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.28.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.28.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.28.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.28.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.28.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.28.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.29.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.29.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.29.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.29.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.29.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.29.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.29.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.30.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.30.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.30.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.30.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.30.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.30.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.30.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.31.attn_q.weight f16 [ 4096, 4096, 1, 1 ]
blk.31.attn_k.weight f16 [ 4096, 4096, 1, 1 ]
blk.31.attn_v.weight f16 [ 4096, 4096, 1, 1 ]
blk.31.attn_output.weight f16 [ 4096, 4096, 1, 1 ]
blk.31.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ]
blk.31.ffn_down.weight f16 [ 11008, 4096, 1, 1 ]
blk.31.ffn_up.weight f16 [ 4096, 11008, 1, 1 ]
blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
output_norm.weight f32 [ 4096, 1, 1, 1 ]
output.weight f16 [ 4096, 46336, 1, 1 ]
general.architecture str
general.name str
general.source.hugginface.repository str
llama.tensor_data_layout str
llama.context_length u32
llama.embedding_length u32
llama.block_count u32
llama.feed_forward_length u32
llama.rope.dimension_count u32
llama.attention.head_count u32
llama.attention.head_count_kv u32
llama.attention.layer_norm_rms_epsilon f32
tokenizer.ggml.bos_token_id u32
tokenizer.ggml.eos_token_id u32
tokenizer.ggml.unknown_token_id u32
65 tensors
f16: 226 tensors
not found in model: tokenizer.ggml.tokens
failed to load model
error: failed to load model 'models/7B/ggml-model-f16.gguf'

The text was updated successfully, but these errors were encountered:

KerfuffleV2 · 2023-08-29T07:27:59Z

I think tokenizer.model was missing from the directory you converted from. Right now, some of those scripts just skip including vocabulary if the file isn't there without informing the user.

llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from models/7B/ggml-model-f16.gguf (version GGUF V1L��.llama_model_loader: - tensor 0: token_embd.weight f16 [ 4096, 46336, 1, 1 ]

The special characters at the GGUF version also look kind of weird. I'm pretty sure your main issue in the tokenizer.model thing though.

kurugai · 2023-08-29T08:22:12Z

dear KerfuffleV2

Thank you for your reply.
As you mentioned, there was no tokenizer.model file in the model I was trying to make a gguf.
But I checked that the tokenizer.json file is there.
I'm sorry to keep asking questions, but can I ask you how to make tokenizer.model?

MODEL URL : https://huggingface.co/kfkas/Llama-2-ko-7b-Chat/tree/main

FILE LIST
.gitattributes
LICENSE
README.md
config.json
generation_config.json
pytorch_model-00001-of-00002.bin
pytorch_model-00002-of-00002.bin
pytorch_model.bin.index.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json

KerfuffleV2 · 2023-08-29T09:17:37Z

I'm sorry to keep asking questions, but can I ask you how to make tokenizer.model?

No need to apologize. I think because it's a Korean model that it uses a different tokenizer type than that script expects. From your link:

"Since Llama-2-Ko uses FastTokenizer provided by HF tokenizers NOT sentencepiece package, it is required to use use_fast=True option when initialize tokenizer."

I'm not an expert on this, but I think that may mean it uses a BPE tokenizer rather than SPM (which is typical for LLaMA models). I don't know if it will work, but you can try using the main convert.py script with --vocabtype bpe

It's possible this model uses a type of tokenizer or configuration that llama.cpp doesn't currently support.

klosax · 2023-08-29T10:52:24Z

I'm not an expert on this, but I think that may mean it uses a BPE tokenizer rather than SPM

In tokenizer.json it looks like it uses the BPE tokenizer:

...
  "model": {
    "type": "BPE",
    "dropout": null,
...

kurugai · 2023-08-29T11:28:51Z

I think I need the vocab.json file. However, there is an error because this file is not in this model folder.

E:\AI\llama.cpp>python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00001-of-00002.bin
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00001-of-00002.bin
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00002-of-00002.bin
params = Params(n_vocab=46336, n_embd=4096, n_mult=5504, n_layer=32, n_ctx=2048, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-05, f_rope_freq_base=None, f_rope_scale=None, ftype=None, path_model=WindowsPath('models/kfkas_Llama-2-ko-7b-Chat'))
Traceback (most recent call last):
File "E:\AI\llama.cpp\convert.py", line 1172, in
main()
File "E:\AI\llama.cpp\convert.py", line 1156, in main
vocab = load_vocab(vocab_dir, args.vocabtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\llama.cpp\convert.py", line 1064, in load_vocab
raise FileNotFoundError(
FileNotFoundError: Could not find vocab.json in models\kfkas_Llama-2-ko-7b-Chat or its parent; if it's in another directory, pass the directory as --vocab-dir

klosax · 2023-08-29T11:41:24Z

I think I need the vocab.json file. However, there is an error because this file is not in this model folder.

No the conversion script does this wrong, it should use the tokenizer.json file if it exists.

KerfuffleV2 · 2023-08-29T11:41:56Z

I think this little script will work for extracting the vocab:

import json, sys
tokenizer = json.load(sys.stdin)
json.dump(tokenizer['model']['vocab'], sys.stdout)

It reads from standard input and writes to standard output so you'll need to do something like:

python blah.py < tokenizer.json > vocab.json

kurugai · 2023-08-29T12:34:17Z

I've made progress with your continuous guidance.
As you said, blah.I made a py script and ran it at the DOS prompt, so the vocab.json file was made well. Thank you.

And execute the command below to a.I checked that the file called gguf was also created well without errors!
python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat

--- a.gguf's info ---
2023-08-29 PM 09:31 13,713,148,992 a.gguf

But When running 'E:\AI\llama.cpp>main -m a.gguf', there is a problem that LLM has to generate arbitrary strings, but it cannot. I think the gguf file is well made, but it's weird.

--- output ----
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q8_0: 226 tensors
llm_load_print_meta: format = GGUF V1 (support until nov 2023)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 46336
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly Q8_0
llm_load_print_meta: model size = 6.86 B
llm_load_print_meta: general.name = LLaMA
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.09 MB
llm_load_tensors: mem required = 6947.73 MB (+ 256.00 MB per state)
.................................................................................................
llama_new_context_with_model: kv self size = 256.00 MB
llama_new_context_with_model: compute buffer total size = 99.91 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

       <----- no generated string

KerfuffleV2 · 2023-08-29T12:43:09Z

What if you specify a prompt like:

main -m a.gguf -p "Why is the sky blue?"

kurugai · 2023-08-29T12:43:25Z

import json, sys tokenizer = json.load(sys.stdin) json.dump(tokenizer['model']['vocab'], sys.stdout, ensure_ascii=False)

I added 'ensure_ascii=False' to json dump due to Korean Unicode display problem.

kurugai · 2023-08-29T12:48:12Z

run : main -m a.gguf -p "Why is the sky blue?"

output :
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

                        <----- I waited for about 5 minutes, but no strings were generated.

^C
E:\AI\llama.cpp>

KerfuffleV2 · 2023-08-29T13:58:58Z

If there was going to be output, you'd see it pretty quickly. This is almost certainly an issue with the vocabulary but I'm not knowledgeable enough to really fix it.

Just in case it's something to do with the ensure_ascii thing or using redirection, you can try this alternative for converting the vocab:

import json
with open("tokenizer.json", "r", encoding="utf-8") as f:
  tokenizer = json.load(f)
with open("vocab.json", "w", encoding="utf-8") as f:
  json.dump(tokenizer['model']['vocab'], f)

I doubt it will make a difference though. If not, hopefully someone else will be able to help you.

kurugai · 2023-08-29T14:39:57Z

dear KerfuffleV2

I created vocab.json with the modified code, but it is not generating the same string.
I've been searching online, but it's not an easy fight :-)
Thank you for helping me all day long.

KerfuffleV2 · 2023-08-29T14:54:12Z

I created vocab.json with the modified code, but it is not generating the same string.

Do you just mean the result is the same: no output? If so, unfortunately that's pretty much what I expected because I didn't expect the second version of the conversion script to really make a difference.

I don't think you're doing anything wrong, it just doesn't seem like llama.cpp currently supports that particular model.

I'd suggest keeping this issue open but editing it a bit to be something more like "Converting kfkas Llama-2-ko-7b-Chat to GGUF fails" or possibly create a different issue like "Please add support for kfkas llama-2-ko-7b-chat" and link here for context.

kurugai · 2023-08-29T15:22:10Z

Do you just mean the result is the same: no output?

yes. The same string was not generated.

As you said, I revised the title of this issue and registered a new issue. Thank you for your advice.^^

akeyhero · 2023-08-29T17:19:38Z

I could reproduce this on the original Llama 2 with --vocabtype bpe.

Note that the tokenizer.json of the Llama 2 says type == BPE although they indeed have tokenizer.model and I confirmed Llama 2 gguf worked with tokenizer.model (namely without --vocabtype bpe):

(snip)
  "model": {
    "type": "BPE",
    "dropout": null,
    "unk_token": "<unk>",
    "continuing_subword_prefix": null,
    "end_of_word_suffix": null,
    "fuse_unk": true,
    "byte_fallback": true,
    "vocab": {
      "<unk>": 0,
(snip)

I downloaded Llama 2 files in models/Llama-2-7b-chat-hf and then

# create vocab.json
$ cat models/Llama-2-7b-chat-hf/tokenizer.json | jq --ascii-output '.model.vocab' > models/Llama-2-7b-chat-hf/vocab.json
$ python convert.py models/Llama-2-7b-chat-hf --vocabtype bpe

$ ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf--verbose-prompt -n 128 -p "$(echo "<s>[INST] How are you? [/INST]")"
(snip)
llm_load_print_meta: format         = GGUF V1 (support until nov 2023)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 4096
llm_load_print_meta: n_head         = 32
llm_load_print_meta: n_head_kv      = 32
llm_load_print_meta: n_layer        = 32
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: n_ff           = 11008
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 7B
llm_load_print_meta: model ftype    = mostly F16
llm_load_print_meta: model size     = 6.74 B
llm_load_print_meta: general.name   = models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.09 MB
llm_load_tensors: mem required  = 12853.10 MB (+  256.00 MB per state)
...................................................................................................
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size =   71.91 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |

main: prompt: '<s>[INST] How are you? [/INST]'
main: number of tokens in prompt = 18
     1 -> ''
   529 -> ''
 29879 -> ''
 24566 -> ''
 29902 -> ''
  3059 -> ''
 29911 -> ''
 29962 -> ''
  1128 -> ''
   526 -> ''
   366 -> ''
 29973 -> ''
   518 -> ''
 29914 -> ''
 29902 -> ''
  3059 -> ''
 29911 -> ''
 29962 -> ''

sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


 [end of text]

I wonder if vocab type = SPM is correct in this setting.

(Of course, I can do with tokenizer.model in the case of the original Llama 2, but the model I want to try does not have tokenizer.model)

klosax · 2023-08-29T17:52:05Z

I wonder if vocab type = SPM is correct in this setting.

No the conversion script should set the tokenizer model kv properly to gpt2 when the source model uses BPE tokenizer.
@KerfuffleV2

KerfuffleV2 · 2023-08-29T17:56:50Z

No the conversion script should set the tokenizer model kv properly to gpt2 when the source model uses BPE tokenizer.

Ahh, it seems like convert.py just always sets it to llama no matter what. I can fix it in #2842

KerfuffleV2 · 2023-08-29T18:06:12Z

@kurugai If you want to try what klosax suggested, find the line

        self.gguf.add_tokenizer_model("llama")

In convert.py and change it to this:

        if isinstance(vocab, SentencePieceVocab):
            self.gguf.add_tokenizer_model("llama")
        elif isinstance(vocab, BpeVocab):
            self.gguf.add_tokenizer_model("gpt2")
        else:
            raise ValueError(f'Unknown vocab type: Not BpeVocab or SentencePieceVocab')

klosax · 2023-08-29T18:36:58Z

This line:

llama.cpp/convert.py

Line 841 in 53885d7

self.gguf.add_tokenizer_model("llama")

kurugai · 2023-08-29T21:33:36Z

@KerfuffleV2
I modified convert.py as follows.

        #self.gguf.add_tokenizer_model("llama")
        if isinstance(vocab, SentencePieceVocab):
            self.gguf.add_tokenizer_model("llama")
        elif isinstance(vocab, BpeVocab):
            self.gguf.add_tokenizer_model("gpt2")
        else:
            raise ValueError(f'Unknown vocab type: Not BpeVocab or SentencePieceVocab')

And I made 'a.gguf' using the command below.

python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat

However, when executing the main command, the following error message was displayed during the model loading process.

main -m a.gguf -p "Why is the sky blue?"
................. (omission)
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  226 tensors
error loading model: cannot find tokenizer merges in model file

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'a.gguf'
main: error: unable to load model

KerfuffleV2 · 2023-08-29T21:57:56Z

Ahh, I forgot the version in master doesn't handle merges. If you're comfortable with testing a pull, you can try checking out #2842 and using that (you'll need to install the GGUF package from that pull as well).

Unless you're really impatient, your best bet is probably to just wait until that pull gets merged. That will hopefully fix this issue.

edit: Just want to add that I'd be really happy for people to test those changes. So if you do want to try it but need to ask some questions first, that's no problem. Don't be afraid of bothering me, it's up to whether you feel like going through the trouble or not.

kurugai · 2023-08-30T00:19:58Z

@KerfuffleV2

Thank you for your feedback. First of all, I'm not used to pull tests. I'll wait until it merges. The day of the merger!! I'll check right away. Thank you for letting me know your sincerity.

akeyhero · 2023-08-30T01:02:00Z

I ran make clean and make after checked out to KerfuffleV2/feat-scripts-improvements.
I got:

$ python convert.py models/Llama-2-7b-chat-hf --vocabtype bpe
Traceback (most recent call last):
  File "/Users/xxxx/projects/ggerganov/llama.cpp/convert.py", line 808, in <module>
    class OutputFile:
  File "/Users/xxxx/projects/ggerganov/llama.cpp/convert.py", line 859, in OutputFile
    def add_meta_special_vocab(self, svocab: gguf.SpecialVocab) -> None:
                                             ^^^^^^^^^^^^^^^^^
AttributeError: module 'gguf' has no attribute 'SpecialVocab'

Do you have any idea to solve this?

KerfuffleV2 · 2023-08-30T01:13:40Z

Do you have any idea to solve this?

You need to install the gguf Python package from that fork. Assuming you're already in a Python virtual environment you can do pip install --upgrade ./gguf-py

You might need to reactivate the environment also.

akeyhero · 2023-08-30T01:33:21Z

Thank you. I've totally forgotten about pip stuff.

I ran this in addition to #2865 (comment) (although without merges.txt I got no error on convert.py.):

$ cat models/Llama-2-7b-chat-hf/tokenizer.json | jq -r --ascii-output '.model.merges[]' > models/Llama-2-7b-chat-hf/merges.txt

And then:

$ ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf --verbose-prompt -n 128 -p "$(echo "<s>[INST] How are you? [/INST]")"
(snip)
ERROR: byte not found in vocab: '
'
zsh: segmentation fault  ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf --verbose-prompt -n

Any idea? 😭

KerfuffleV2 · 2023-08-30T06:20:30Z

Any idea?

You're just trying this with a normal LLaMA2 model not the one OP was testing, right? The only thing I can think of is it's because you're using a model that wasn't intended to use the BPE tokenizer mode. I'm not an expert on the tokenizer stuff so that idea might not be worth too much. I'm going to download OP's exact model and try it, if I get the same result as you then we'll know it's not because of what I mentioned.

edit: Your issue looks like #2889 so maybe it's just an issue with the BPE tokenizer and nothing you did. You could try loading the model you generated with #2842 using main compiled from #2889 and see if that fixes your issue.

edit: So, I got OP's Korean model converted (it did require generating vocab.json). This does need #2889 to avoid dying immediately. All the token contents still map to blank string strings because convert adds BPE vocab tokens as USER_DEFINED but there's no case to handle converting those to string (there's a partial workaround in the comments for that pull).

akeyhero · 2023-08-30T08:08:00Z

Thank you for your reply.
#2889 should be exactly my issue.

KerfuffleV2 · 2023-08-30T08:21:25Z

Unfortunately, even with the change I suggested in the comments there it's still not really going to be correct. You'll see stuff like <0x20> instead of spaces.

kurugai · 2023-08-30T14:04:49Z

@KerfuffleV2

Version info

gguf 0.3.0
llama.cpp 2023-08-30 ver

Comment

Hi. I think it's merged, so I installed a new package of llama.cpp and gguf and made 'a.gguf' in the same way as yesterday.
The following error was displayed when running main, and the inference string was not generated.

Is it correct that the merger has been completed?

ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '

Below is the full log of the main command.

E:\AI\llama.cpp>main -m a.gguf -p "Why is the sky blue?"
Log start
main: build = 1128 (b532a69)
main: seed  = 1693403947
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from a.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q8_0     [  4096, 46336,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    6:            blk.0.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    7:              blk.0.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    8:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    9:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   10:              blk.1.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   11:              blk.1.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   12:              blk.1.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   13:         blk.1.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   14:            blk.1.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   15:            blk.1.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   16:              blk.1.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   17:           blk.1.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   18:            blk.1.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   19:              blk.2.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   20:              blk.2.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   21:              blk.2.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   22:         blk.2.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   23:            blk.2.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   24:            blk.2.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   25:              blk.2.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   26:           blk.2.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   27:            blk.2.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   28:              blk.3.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   29:              blk.3.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   30:              blk.3.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   31:         blk.3.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   32:            blk.3.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   33:            blk.3.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   34:              blk.3.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   35:           blk.3.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   36:            blk.3.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   37:              blk.4.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   38:              blk.4.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   39:              blk.4.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   40:         blk.4.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   41:            blk.4.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   42:            blk.4.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   43:              blk.4.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   44:           blk.4.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   45:            blk.4.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   46:              blk.5.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   47:              blk.5.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   48:              blk.5.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   49:         blk.5.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   50:            blk.5.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   51:            blk.5.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   52:              blk.5.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   53:           blk.5.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   54:            blk.5.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   55:              blk.6.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   56:              blk.6.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   57:              blk.6.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   58:         blk.6.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   59:            blk.6.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   60:            blk.6.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   61:              blk.6.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   62:           blk.6.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   63:            blk.6.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   64:              blk.7.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   65:              blk.7.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   66:              blk.7.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   67:         blk.7.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   68:            blk.7.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   69:            blk.7.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   70:              blk.7.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   71:           blk.7.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   72:            blk.7.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   73:              blk.8.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   74:              blk.8.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   75:              blk.8.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   76:         blk.8.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   77:            blk.8.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   78:            blk.8.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   79:              blk.8.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   80:           blk.8.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   81:            blk.8.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   82:              blk.9.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   83:              blk.9.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   84:              blk.9.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   85:         blk.9.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   86:            blk.9.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   87:            blk.9.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   88:              blk.9.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   89:           blk.9.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   90:            blk.9.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   91:             blk.10.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   92:             blk.10.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   93:             blk.10.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   94:        blk.10.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   95:           blk.10.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   96:           blk.10.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   97:             blk.10.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   98:          blk.10.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   99:           blk.10.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  100:             blk.11.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  101:             blk.11.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  102:             blk.11.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  103:        blk.11.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  104:           blk.11.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  105:           blk.11.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  106:             blk.11.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  107:          blk.11.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  108:           blk.11.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  109:             blk.12.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  110:             blk.12.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  111:             blk.12.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  112:        blk.12.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  113:           blk.12.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  114:           blk.12.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  115:             blk.12.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  116:          blk.12.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  117:           blk.12.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  118:             blk.13.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  119:             blk.13.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  120:             blk.13.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  121:        blk.13.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  122:           blk.13.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  123:           blk.13.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  124:             blk.13.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  125:          blk.13.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  126:           blk.13.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  127:             blk.14.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  128:             blk.14.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  129:             blk.14.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  130:        blk.14.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  131:           blk.14.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  132:           blk.14.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  133:             blk.14.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  134:          blk.14.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  135:           blk.14.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  136:             blk.15.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  137:             blk.15.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  138:             blk.15.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  139:        blk.15.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  140:           blk.15.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  141:           blk.15.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  142:             blk.15.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  143:          blk.15.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  144:           blk.15.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  145:             blk.16.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  146:             blk.16.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  147:             blk.16.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  148:        blk.16.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  149:           blk.16.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  150:           blk.16.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  151:             blk.16.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  152:          blk.16.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  153:           blk.16.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  154:             blk.17.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  155:             blk.17.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  156:             blk.17.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  157:        blk.17.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  158:           blk.17.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  159:           blk.17.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  160:             blk.17.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  161:          blk.17.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  162:           blk.17.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  163:             blk.18.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  164:             blk.18.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  165:             blk.18.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  166:        blk.18.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  167:           blk.18.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  168:           blk.18.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  169:             blk.18.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  170:          blk.18.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  171:           blk.18.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  172:             blk.19.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  173:             blk.19.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  174:             blk.19.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  175:        blk.19.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  176:           blk.19.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  177:           blk.19.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  178:             blk.19.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  179:          blk.19.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  180:           blk.19.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  181:             blk.20.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  182:             blk.20.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  183:             blk.20.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  184:        blk.20.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  185:           blk.20.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  186:           blk.20.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  187:             blk.20.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  188:          blk.20.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  189:           blk.20.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  190:             blk.21.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  191:             blk.21.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  192:             blk.21.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  193:        blk.21.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  194:           blk.21.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  195:           blk.21.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  196:             blk.21.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  197:          blk.21.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  198:           blk.21.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  199:             blk.22.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  200:             blk.22.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  201:             blk.22.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  202:        blk.22.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  203:           blk.22.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  204:           blk.22.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  205:             blk.22.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  206:          blk.22.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  207:           blk.22.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  208:             blk.23.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  209:             blk.23.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  210:             blk.23.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  211:        blk.23.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  212:           blk.23.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  213:           blk.23.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  214:             blk.23.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  215:          blk.23.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  216:           blk.23.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  217:             blk.24.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  218:             blk.24.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  219:             blk.24.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  220:        blk.24.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  221:           blk.24.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  222:           blk.24.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  223:             blk.24.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  224:          blk.24.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  225:           blk.24.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  226:             blk.25.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  227:             blk.25.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  228:             blk.25.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  229:        blk.25.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  230:           blk.25.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  231:           blk.25.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  232:             blk.25.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  233:          blk.25.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  234:           blk.25.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  235:             blk.26.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  236:             blk.26.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  237:             blk.26.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  238:        blk.26.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  239:           blk.26.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  240:           blk.26.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  241:             blk.26.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  242:          blk.26.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  243:           blk.26.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  244:             blk.27.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  245:             blk.27.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  246:             blk.27.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  247:        blk.27.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  248:           blk.27.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  249:           blk.27.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  250:             blk.27.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  251:          blk.27.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  252:           blk.27.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  253:             blk.28.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  254:             blk.28.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  255:             blk.28.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  256:        blk.28.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  257:           blk.28.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  258:           blk.28.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  259:             blk.28.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  260:          blk.28.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  261:           blk.28.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  262:             blk.29.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  263:             blk.29.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  264:             blk.29.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  265:        blk.29.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  266:           blk.29.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  267:           blk.29.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  268:             blk.29.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  269:          blk.29.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  270:           blk.29.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  271:             blk.30.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  272:             blk.30.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  273:             blk.30.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  274:        blk.30.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  275:           blk.30.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  276:           blk.30.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  277:             blk.30.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  278:          blk.30.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  279:           blk.30.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  280:             blk.31.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  281:             blk.31.attn_k.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  282:             blk.31.attn_v.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  283:        blk.31.attn_output.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  284:           blk.31.ffn_gate.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  285:           blk.31.ffn_down.weight q8_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  286:             blk.31.ffn_up.weight q8_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  287:          blk.31.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  288:           blk.31.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  289:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  290:                    output.weight q8_0     [  4096, 46336,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str
llama_model_loader: - kv   1:                               general.name str
llama_model_loader: - kv   2:                       llama.context_length u32
llama_model_loader: - kv   3:                     llama.embedding_length u32
llama_model_loader: - kv   4:                          llama.block_count u32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32
llama_model_loader: - kv   7:                 llama.attention.head_count u32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv  10:                          general.file_type u32
llama_model_loader: - kv  11:                       tokenizer.ggml.model str
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q8_0:  226 tensors
ERROR: byte not found in vocab: '
'
llm_load_print_meta: format         = GGUF V2 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = BPE
llm_load_print_meta: n_vocab        = 46336
llm_load_print_meta: n_merges       = 77738
llm_load_print_meta: n_ctx_train    = 2048
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 4096
llm_load_print_meta: n_head         = 32
llm_load_print_meta: n_head_kv      = 32
llm_load_print_meta: n_layer        = 32
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 11008
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 7B
llm_load_print_meta: model ftype    = mostly Q8_0
llm_load_print_meta: model size     = 6.86 B
llm_load_print_meta: general.name   = models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 0 '<unk>'
llm_load_tensors: ggml ctx size =    0.09 MB
llm_load_tensors: mem required  = 6947.73 MB (+  256.00 MB per state)
.................................................................................................
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size =   99.97 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

KerfuffleV2 · 2023-08-30T14:13:58Z

Is it correct that the merger has been completed?

Yes, it got merged today. Unfortunately, that wasn't enough to fix models using BPE (like this one). Look a little bit higher in the thread, I linked to a pull with a fix for the "byte not found thing". However, even with that change the content of all the tokens is still blank. There's a partial fix in the comments, but there are still problems.

The good news it seems like people are aware of at least some of the problems and they're being looked at/worked on.

kurugai · 2023-08-30T14:17:57Z

@KerfuffleV2

The good news it seems like people are aware of at least some of the problems and they're being looked at/worked on.

Good news! I will try whenever there is a related source modification in the future. :)

akeyhero · 2023-09-08T12:17:11Z

@kurugai The byte not found in vocab errors might have been solved as #2889 had merged.
(I failed with this error though #2965)

kurugai · 2023-11-28T11:52:00Z

@KerfuffleV2
Successfully converted to convert.py at the site below. Thank you for your help in the meantime. I'll close this one.
https://github.com/strutive07/llama.cpp/tree/convert_hf_vocab

kurugai changed the title ~~[User] Insert summary of your issue or enhancement..~~ convert-llama-hf-to-gguf.After py conversion, model loading is not possible with converted gguf file. Aug 29, 2023

kurugai changed the title ~~convert-llama-hf-to-gguf.After py conversion, model loading is not possible with converted gguf file.~~ convert-llama-hf-to-gguf.The gguf file converted to py does not load. Aug 29, 2023

kurugai changed the title ~~convert-llama-hf-to-gguf.The gguf file converted to py does not load.~~ Converting kfkas Llama-2-ko-7b-Chat to GGUF fails Aug 29, 2023

kurugai mentioned this issue Aug 29, 2023

Please add support for kfkas llama-2-ko-7b-chat #2877

Closed

KerfuffleV2 mentioned this issue Aug 29, 2023

Various script cleanups/fixes + convert merges and special token handling #2842

Merged

kurugai closed this as completed Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting kfkas Llama-2-ko-7b-Chat to GGUF fails #2865

Converting kfkas Llama-2-ko-7b-Chat to GGUF fails #2865

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

klosax commented Aug 29, 2023

kurugai commented Aug 29, 2023

klosax commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023 •

edited

Loading

akeyhero commented Aug 29, 2023 •

edited

Loading

klosax commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

klosax commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023 •

edited

Loading

kurugai commented Aug 30, 2023

akeyhero commented Aug 30, 2023

KerfuffleV2 commented Aug 30, 2023

akeyhero commented Aug 30, 2023 •

edited

Loading

KerfuffleV2 commented Aug 30, 2023 •

edited

Loading

akeyhero commented Aug 30, 2023

KerfuffleV2 commented Aug 30, 2023

kurugai commented Aug 30, 2023

KerfuffleV2 commented Aug 30, 2023

kurugai commented Aug 30, 2023

akeyhero commented Sep 8, 2023

kurugai commented Nov 28, 2023

Converting kfkas Llama-2-ko-7b-Chat to GGUF fails #2865

Converting kfkas Llama-2-ko-7b-Chat to GGUF fails #2865

Comments

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

klosax commented Aug 29, 2023

kurugai commented Aug 29, 2023

klosax commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

kurugai commented Aug 29, 2023 • edited Loading

akeyhero commented Aug 29, 2023 • edited Loading

klosax commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023

klosax commented Aug 29, 2023

kurugai commented Aug 29, 2023

KerfuffleV2 commented Aug 29, 2023 • edited Loading

kurugai commented Aug 30, 2023

akeyhero commented Aug 30, 2023

KerfuffleV2 commented Aug 30, 2023

akeyhero commented Aug 30, 2023 • edited Loading

KerfuffleV2 commented Aug 30, 2023 • edited Loading

akeyhero commented Aug 30, 2023

KerfuffleV2 commented Aug 30, 2023

kurugai commented Aug 30, 2023

Version info

Comment

KerfuffleV2 commented Aug 30, 2023

kurugai commented Aug 30, 2023

akeyhero commented Sep 8, 2023

kurugai commented Nov 28, 2023

kurugai commented Aug 29, 2023 •

edited

Loading

akeyhero commented Aug 29, 2023 •

edited

Loading

KerfuffleV2 commented Aug 29, 2023 •

edited

Loading

akeyhero commented Aug 30, 2023 •

edited

Loading

KerfuffleV2 commented Aug 30, 2023 •

edited

Loading