We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: windows11 pro 10.0.22621 cpu: Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz 3.60 GHz gpu: NVIDIA GeForce RTX 3090 driver version: 32.0.15.6094 model Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf and Meta-Llama-3.1-8B-Instruct-Q2_K.gguf git branch: main
log when select the model:
[2024-12-11 21:39:18.050] [info] [WASI-NN] GGML backend: LLAMA_COMMIT c8a00909 [2024-12-11 21:39:18.051] [info] [WASI-NN] GGML backend: LLAMA_BUILD_NUMBER 3499 [2024-12-11 21:39:18.095] [info] [WASI-NN] llama.cpp: llama_model_loader: loaded meta data with 33 key-value pairs and 292 tensors from C:\Users\wuanz\AppData\Roaming\moxin-org\moly\data\model_downloads\second-state/Meta-Llama-3.1-8B-Instruct-GGUF\Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf (version GGUF V3 (latest)) [2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. [2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 0: general.architecture str = llama [2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 1: general.type str = model [2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct [2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 3: general.finetune str = Instruct [2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 [2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 5: general.size_label str = 8B [2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 6: general.license str = llama3.1 [2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... [2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... [2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 9: llama.block_count u32 = 32 [2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 10: llama.context_length u32 = 131072 [2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 11: llama.embedding_length u32 = 4096 [2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336 [2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 13: llama.attention.head_count u32 = 32 [2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 [2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 [2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 [2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 17: general.file_type u32 = 17 [2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 [2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 [2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 [2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe [2024-12-11 21:39:18.126] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ... [2024-12-11 21:39:18.134] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... [2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["臓 臓", "臓 臓臓臓", "臓臓 臓臓", "... [2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 [2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 [2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... [2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 28: general.quantization_version u32 = 2 [2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 29: quantize.imatrix.file str = /models_out/Meta-Llama-3.1-8B-Instruc... [2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 30: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt [2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 31: quantize.imatrix.entries_count i32 = 224 [2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 32: quantize.imatrix.chunks_count i32 = 125 [2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type f32: 66 tensors [2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type q5_K: 193 tensors [2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type q6_K: 33 tensors [2024-12-11 21:39:18.424] [info] [WASI-NN] llama.cpp: llm_load_vocab: special tokens cache size = 256 [2024-12-11 21:39:18.449] [info] [WASI-NN] llama.cpp: llm_load_vocab: token to piece cache size = 0.7999 MB [2024-12-11 21:39:18.449] [info] [WASI-NN] llama.cpp: llm_load_print_meta: format = GGUF V3 (latest) [2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: arch = llama [2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: vocab type = BPE [2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_vocab = 128256 [2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_merges = 280147 [2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: vocab_only = 0 [2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ctx_train = 131072 [2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd = 4096 [2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_layer = 32 [2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_head = 32 [2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_head_kv = 8 [2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_rot = 128 [2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_swa = 0 [2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_head_k = 128 [2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_head_v = 128 [2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_gqa = 4 [2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_k_gqa = 1024 [2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_v_gqa = 1024 [2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_norm_eps = 0.0e+00 [2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 [2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_clamp_kqv = 0.0e+00 [2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 [2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_logit_scale = 0.0e+00 [2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ff = 14336 [2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_expert = 0 [2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_expert_used = 0 [2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: causal attn = 1 [2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: pooling type = 0 [2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope type = 0 [2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope scaling = linear [2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: freq_base_train = 500000.0 [2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: freq_scale_train = 1 [2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ctx_orig_yarn = 131072 [2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope_finetuned = unknown [2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_conv = 0 [2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_inner = 0 [2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_state = 0 [2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_dt_rank = 0 [2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model type = 8B [2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model ftype = Q5_K - Medium [2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model params = 8.03 B [2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model size = 5.33 GiB (5.70 BPW) [2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct [2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' [2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' [2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: LF token = 128 '脛' [2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' [2024-12-11 21:39:18.458] [info] [WASI-NN] llama.cpp: llm_load_print_meta: max token length = 256 [2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no [2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no [2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: found 1 CUDA devices: [2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes [2024-12-11 21:39:18.592] [info] [WASI-NN] llama.cpp: llm_load_tensors: ggml ctx size = 0.27 MiB [2024-12-11 21:39:21.495] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloading 32 repeating layers to GPU [2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloading non-repeating layers to GPU [2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloaded 33/33 layers to GPU [2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: CPU buffer size = 344.44 MiB [2024-12-11 21:39:21.497] [info] [WASI-NN] llama.cpp: llm_load_tensors: CUDA0 buffer size = 5115.50 MiB [2024-12-11 21:39:23.534] [info] [WASI-NN] llama.cpp: [2024-12-11 21:39:23.543] [info] [WASI-NN] GGML backend: llama_system_info: AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | [2024-12-11 21:39:23.546] [info] [WASI-NN] GGML backend: LLAMA_COMMIT c8a00909 [2024-12-11 21:39:23.547] [info] [WASI-NN] GGML backend: LLAMA_BUILD_NUMBER 3499 [2024-12-11 21:39:23.552] [error] [WASI-NN] llama.cpp: llama_model_load: error loading model: tensor 'blk.3.attn_qkv.weight' data is not within the file bounds, model is corrupted or incomplete [2024-12-11 21:39:23.552] [error] [WASI-NN] llama.cpp: llama_load_model_from_file: failed to load model [2024-12-11 21:39:23.552] [error] [WASI-NN] GGML backend: Error: unable to init model. [2024-12-11T13:39:23Z ERROR stdout] Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument Error: Operation("Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument") Error loading model: Failed to start the model
The text was updated successfully, but these errors were encountered:
No branches or pull requests
os: windows11 pro 10.0.22621
cpu: Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz 3.60 GHz
gpu: NVIDIA GeForce RTX 3090 driver version: 32.0.15.6094
model Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf and Meta-Llama-3.1-8B-Instruct-Q2_K.gguf
git branch: main
log when select the model:


[2024-12-11 21:39:18.050] [info] [WASI-NN] GGML backend: LLAMA_COMMIT c8a00909
[2024-12-11 21:39:18.051] [info] [WASI-NN] GGML backend: LLAMA_BUILD_NUMBER 3499
[2024-12-11 21:39:18.095] [info] [WASI-NN] llama.cpp: llama_model_loader: loaded meta data with 33 key-value pairs and 292 tensors from C:\Users\wuanz\AppData\Roaming\moxin-org\moly\data\model_downloads\second-state/Meta-Llama-3.1-8B-Instruct-GGUF\Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf (version GGUF V3 (latest))
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 0: general.architecture str = llama
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 1: general.type str = model
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct
[2024-12-11 21:39:18.096] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 3: general.finetune str = Instruct
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 5: general.size_label str = 8B
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 6: general.license str = llama3.1
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
[2024-12-11 21:39:18.097] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 9: llama.block_count u32 = 32
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 10: llama.context_length u32 = 131072
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 11: llama.embedding_length u32 = 4096
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
[2024-12-11 21:39:18.098] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
[2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
[2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 17: general.file_type u32 = 17
[2024-12-11 21:39:18.099] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
[2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
[2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
[2024-12-11 21:39:18.100] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
[2024-12-11 21:39:18.126] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
[2024-12-11 21:39:18.134] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["臓 臓", "臓 臓臓臓", "臓臓 臓臓", "...
[2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
[2024-12-11 21:39:18.190] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 28: general.quantization_version u32 = 2
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 29: quantize.imatrix.file str = /models_out/Meta-Llama-3.1-8B-Instruc...
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 30: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt
[2024-12-11 21:39:18.191] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 31: quantize.imatrix.entries_count i32 = 224
[2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - kv 32: quantize.imatrix.chunks_count i32 = 125
[2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type f32: 66 tensors
[2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type q5_K: 193 tensors
[2024-12-11 21:39:18.192] [info] [WASI-NN] llama.cpp: llama_model_loader: - type q6_K: 33 tensors
[2024-12-11 21:39:18.424] [info] [WASI-NN] llama.cpp: llm_load_vocab: special tokens cache size = 256
[2024-12-11 21:39:18.449] [info] [WASI-NN] llama.cpp: llm_load_vocab: token to piece cache size = 0.7999 MB
[2024-12-11 21:39:18.449] [info] [WASI-NN] llama.cpp: llm_load_print_meta: format = GGUF V3 (latest)
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: arch = llama
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: vocab type = BPE
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_vocab = 128256
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_merges = 280147
[2024-12-11 21:39:18.450] [info] [WASI-NN] llama.cpp: llm_load_print_meta: vocab_only = 0
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ctx_train = 131072
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd = 4096
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_layer = 32
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_head = 32
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_head_kv = 8
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_rot = 128
[2024-12-11 21:39:18.451] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_swa = 0
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_head_k = 128
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_head_v = 128
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_gqa = 4
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_k_gqa = 1024
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_embd_v_gqa = 1024
[2024-12-11 21:39:18.452] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_norm_eps = 0.0e+00
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_clamp_kqv = 0.0e+00
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: f_logit_scale = 0.0e+00
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ff = 14336
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_expert = 0
[2024-12-11 21:39:18.453] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_expert_used = 0
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: causal attn = 1
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: pooling type = 0
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope type = 0
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope scaling = linear
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: freq_base_train = 500000.0
[2024-12-11 21:39:18.454] [info] [WASI-NN] llama.cpp: llm_load_print_meta: freq_scale_train = 1
[2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: n_ctx_orig_yarn = 131072
[2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: rope_finetuned = unknown
[2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_conv = 0
[2024-12-11 21:39:18.455] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_inner = 0
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_d_state = 0
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: ssm_dt_rank = 0
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model type = 8B
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model ftype = Q5_K - Medium
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model params = 8.03 B
[2024-12-11 21:39:18.456] [info] [WASI-NN] llama.cpp: llm_load_print_meta: model size = 5.33 GiB (5.70 BPW)
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: LF token = 128 '脛'
[2024-12-11 21:39:18.457] [info] [WASI-NN] llama.cpp: llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
[2024-12-11 21:39:18.458] [info] [WASI-NN] llama.cpp: llm_load_print_meta: max token length = 256
[2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: ggml_cuda_init: found 1 CUDA devices:
[2024-12-11 21:39:18.473] [info] [WASI-NN] llama.cpp: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[2024-12-11 21:39:18.592] [info] [WASI-NN] llama.cpp: llm_load_tensors: ggml ctx size = 0.27 MiB
[2024-12-11 21:39:21.495] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloading 32 repeating layers to GPU
[2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloading non-repeating layers to GPU
[2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: offloaded 33/33 layers to GPU
[2024-12-11 21:39:21.496] [info] [WASI-NN] llama.cpp: llm_load_tensors: CPU buffer size = 344.44 MiB
[2024-12-11 21:39:21.497] [info] [WASI-NN] llama.cpp: llm_load_tensors: CUDA0 buffer size = 5115.50 MiB
[2024-12-11 21:39:23.534] [info] [WASI-NN] llama.cpp:
[2024-12-11 21:39:23.543] [info] [WASI-NN] GGML backend: llama_system_info: AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
[2024-12-11 21:39:23.546] [info] [WASI-NN] GGML backend: LLAMA_COMMIT c8a00909
[2024-12-11 21:39:23.547] [info] [WASI-NN] GGML backend: LLAMA_BUILD_NUMBER 3499
[2024-12-11 21:39:23.552] [error] [WASI-NN] llama.cpp: llama_model_load: error loading model: tensor 'blk.3.attn_qkv.weight' data is not within the file bounds, model is corrupted or incomplete
[2024-12-11 21:39:23.552] [error] [WASI-NN] llama.cpp: llama_load_model_from_file: failed to load model
[2024-12-11 21:39:23.552] [error] [WASI-NN] GGML backend: Error: unable to init model.
[2024-12-11T13:39:23Z ERROR stdout] Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument
Error: Operation("Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument")
Error loading model: Failed to start the model
The text was updated successfully, but these errors were encountered: