Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama model load error: tensor 'blk.3.attn_qkv.weight' data is not within the file bounds, model is corrupted or incomplete #328

Open
wuanzhuan opened this issue Dec 11, 2024 · 0 comments
Labels
area: wasmedge bug Something isn't working

Comments

@wuanzhuan
Copy link

wuanzhuan commented Dec 11, 2024

os: windows11 pro 10.0.22621
cpu: Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz 3.60 GHz
gpu: NVIDIA GeForce RTX 3090 driver version: 32.0.15.6094
model Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf and Meta-Llama-3.1-8B-Instruct-Q2_K.gguf
git branch: main

log when select the model:
image
image

[2024-12-11 21:39:18.050] [info] [WASI-NN] GGML backend: LLAMA_COMMIT c8a00909
[2024-12-11 21:39:18.051] [info] [WASI-NN] GGML backend: LLAMA_BUILD_NUMBER 3499
[2024-12-11 21:39:18.095] [info] [WASI-NN] llama.cpp: llama_model_loader: loaded meta data with 33 key-value pairs and 292 tensors from C:\Users\wuanz\AppData\Roaming\moxin-org\moly\data\model_downloads\second-state/Meta-Llama-3.1-8B-Instruct-GGUF\Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf (version GGUF V3 (latest))
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 0: general.architecture str = llama
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 1: general.type str = model
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 3: general.finetune str = Instruct
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 5: general.size_label str = 8B
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 6: general.license str = llama3.1
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 9: llama.block_count u32 = 32
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 10: llama.context_length u32 = 131072
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 11: llama.embedding_length u32 = 4096
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
[2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
[2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 17: general.file_type u32 = 17
[2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
[2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
[2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
[2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
[2024-12-11 21:39:18.126] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
[2024-12-11 21:39:18.134] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["臓 臓", "臓 臓臓臓", "臓臓 臓臓", "...
[2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
[2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 28: general.quantization_version u32 = 2
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 29: quantize.imatrix.file str = /models_out/Meta-Llama-3.1-8B-Instruc...
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 30: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 31: quantize.imatrix.entries_count i32 = 224
[2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 32: quantize.imatrix.chunks_count i32 = 125
[2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type f32: 66 tensors
[2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type q5_K: 193 tensors
[2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type q6_K: 33 tensors
[2024-12-11 21:39:18.424] [info] [WASI-NN] llama.cpp: llm_load_vocab: special tokens cache size = 256
[2024-12-11 21:39:18.449] [info] [WASI-NN] llama.cpp: llm_load_vocab: token to piece cache size = 0.7999 MB
[2024-12-11 21:39:18.449] [info] [WASI-NN] llama.cpp: llm_load_print_meta: format = GGUF V3 (latest)
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: arch = llama
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: vocab type = BPE
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_vocab = 128256
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_merges = 280147
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: vocab_only = 0
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ctx_train = 131072
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd = 4096
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_layer = 32
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_head = 32
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_head_kv = 8
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_rot = 128
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_swa = 0
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_head_k = 128
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_head_v = 128
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_gqa = 4
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_k_gqa = 1024
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_v_gqa = 1024
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_norm_eps = 0.0e+00
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_clamp_kqv = 0.0e+00
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_logit_scale = 0.0e+00
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ff = 14336
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_expert = 0
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_expert_used = 0
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: causal attn = 1
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: pooling type = 0
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope type = 0
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope scaling = linear
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: freq_base_train = 500000.0
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: freq_scale_train = 1
[2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ctx_orig_yarn = 131072
[2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope_finetuned = unknown
[2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_conv = 0
[2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_inner = 0
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_state = 0
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_dt_rank = 0
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model type = 8B
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model ftype = Q5_K - Medium
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model params = 8.03 B
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model size = 5.33 GiB (5.70 BPW)
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: LF token = 128 '脛'
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
[2024-12-11 21:39:18.458] [info] [WASI-NN] llama.cpp: llm_load_print_meta: max token length = 256
[2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: found 1 CUDA devices:
[2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2024-12-11 21:39:18.592] [info] [WASI-NN] llama.cpp: llm_load_tensors: ggml ctx size = 0.27 MiB
[2024-12-11 21:39:21.495] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloading 32 repeating layers to GPU
[2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloading non-repeating layers to GPU
[2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloaded 33/33 layers to GPU
[2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: CPU buffer size = 344.44 MiB
[2024-12-11 21:39:21.497] [info] [WASI-NN] llama.cpp: llm_load_tensors: CUDA0 buffer size = 5115.50 MiB
[2024-12-11 21:39:23.534] [info] [WASI-NN] llama.cpp:
[2024-12-11 21:39:23.543] [info] [WASI-NN] GGML backend: llama_system_info: AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
[2024-12-11 21:39:23.546] [info] [WASI-NN] GGML backend: LLAMA_COMMIT c8a00909
[2024-12-11 21:39:23.547] [info] [WASI-NN] GGML backend: LLAMA_BUILD_NUMBER 3499
[2024-12-11 21:39:23.552] [error] [WASI-NN] llama.cpp: llama_model_load: error loading model: tensor 'blk.3.attn_qkv.weight' data is not within the file bounds, model is corrupted or incomplete
[2024-12-11 21:39:23.552] [error] [WASI-NN] llama.cpp: llama_load_model_from_file: failed to load model
[2024-12-11 21:39:23.552] [error] [WASI-NN] GGML backend: Error: unable to init model.
[2024-12-11T13:39:23Z ERROR stdout] Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument
Error: Operation("Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument")
Error loading model: Failed to start the model

@joulei joulei added area: wasmedge bug Something isn't working labels Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: wasmedge bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants