Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: test run on stories15M-q4_0.gguf result in Segmentation fault. #7711

Closed
vt-alt opened this issue Jun 3, 2024 · 3 comments · Fixed by #7640
Closed

Bug: test run on stories15M-q4_0.gguf result in Segmentation fault. #7711

vt-alt opened this issue Jun 3, 2024 · 3 comments · Fixed by #7640
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

Comments

@vt-alt
Copy link

vt-alt commented Jun 3, 2024

What happened?

For b3072 on x86-64 when I run llama-main on stories15M-q4_0.gguf or stories260K.gguf it crashes. It crashes on test-eval-callback test too.

Name and Version

This test is for tag b3072 at 549279d. b3012 had no problem with it.

What operating system are you seeing the problem on?

ALT Linux

Relevant log output

llama.cpp (sisyphus)$ gdb --args llama-main -m stories15M-q4_0.gguf -n 400 -p "Once opon a time"
GNU gdb (GDB) 14.1.0.56.d739d4fd457-alt1 (ALT Sisyphus)
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alt-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from llama-main...
Reading symbols from /usr/lib/debug/usr/bin/llama-main.debug...
(gdb) r
Starting program: /usr/bin/llama-main -m stories15M-q4_0.gguf -n 400 -p Once\ opon\ a\ time
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Log start
main: build = 3072 (alt1.20240603)
main: built with x86_64-alt-linux-gcc (GCC) 13.2.1 20240128 (ALT Sisyphus 13.2.1-alt3) for x86_64-alt-linux
main: seed  = 1717403314
llama_model_loader: loaded meta data with 20 key-value pairs and 57 tensors from stories15M-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv   1:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv   2:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv   3:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv   4:                       general.architecture str              = llama
llama_model_loader: - kv   5:                               general.name str              = llama
llama_model_loader: - kv   6:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv   7:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv   8:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv   9:          tokenizer.ggml.seperator_token_id u32              = 4294967295
llama_model_loader: - kv  10:            tokenizer.ggml.padding_token_id u32              = 4294967295
llama_model_loader: - kv  11:                       llama.context_length u32              = 128
llama_model_loader: - kv  12:                     llama.embedding_length u32              = 288
llama_model_loader: - kv  13:                  llama.feed_forward_length u32              = 768
llama_model_loader: - kv  14:                 llama.attention.head_count u32              = 6
llama_model_loader: - kv  15:                          llama.block_count u32              = 6
llama_model_loader: - kv  16:                 llama.rope.dimension_count u32              = 48
llama_model_loader: - kv  17:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - kv  19:                          general.file_type u32              = 2
llama_model_loader: - type  f32:   13 tensors
llama_model_loader: - type q4_0:   43 tensors
llama_model_loader: - type q8_0:    1 tensors
llm_load_vocab: bad special token: 'tokenizer.ggml.seperator_token_id' = 4294967295d, using default id -1
llm_load_vocab: bad special token: 'tokenizer.ggml.padding_token_id' = 4294967295d, using default id -1
llm_load_vocab: special tokens cache size = 259
llm_load_vocab: token to piece cache size = 0.1684 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 128
llm_load_print_meta: n_embd           = 288
llm_load_print_meta: n_head           = 6
llm_load_print_meta: n_head_kv        = 6
llm_load_print_meta: n_layer          = 6
llm_load_print_meta: n_rot            = 48
llm_load_print_meta: n_embd_head_k    = 48
llm_load_print_meta: n_embd_head_v    = 48
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 288
llm_load_print_meta: n_embd_v_gqa     = 288
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 768
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 128
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 24.41 M
llm_load_print_meta: model size       = 17.50 MiB (6.01 BPW)
llm_load_print_meta: general.name     = llama
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.03 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/7 layers to GPU
llm_load_tensors:        CPU buffer size =    17.50 MiB
.....................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =     3.38 MiB
llama_new_context_with_model: KV self size  =    3.38 MiB, K (f16):    1.69 MiB, V (f16):    1.69 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.12 MiB

Program received signal SIGSEGV, Segmentation fault.
=> 0x55555564c757 <llama_new_context_with_model(llama_model*, llama_context_params)+5943>:      call   *0x28(%rax)
0x000055555564c757 in ggml_backend_buft_supports_backend (backend=0x555555bb3dd0, buft=0x0) at /usr/src/debug/llama.cpp-3072/ggml-backend.c:48
48          return buft->iface.supports_backend(buft, backend);
(gdb)
(gdb) bt
#0  0x000055555564c757 in ggml_backend_buft_supports_backend (backend=0x555555bb3dd0, buft=0x0) at /usr/src/debug/llama.cpp-3072/ggml-backend.c:48
#1  ggml_backend_sched_new (graph_size=8192, parallel=false, n_backends=2, bufts=0x555555bb3d30, backends=0x55555580bcc0)
    at /usr/src/debug/llama.cpp-3072/ggml-backend.c:1750
#2  llama_new_context_with_model (model=<optimized out>, params=...) at /usr/src/debug/llama.cpp-3072/llama.cpp:16504
#3  0x00005555555a6ac2 in llama_init_from_gpt_params (params=...) at /usr/src/debug/llama.cpp-3072/common/common.cpp:1915
#4  0x00005555555814c0 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/llama.cpp-3072/examples/main/main.cpp:199

There is temporary build log with the test run at the end: https://git.altlinux.org/tasks/350238/build/400/x86_64/log

Crash also occurs on aarch64, and if compiled without openblas.

@vt-alt vt-alt added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 3, 2024
@slaren
Copy link
Collaborator

slaren commented Jun 3, 2024

This is caused by the RPC backend. I believe #7640 will fix it, meanwhile you can remove it from the build if you are not using it.

@vt-alt
Copy link
Author

vt-alt commented Jun 3, 2024

Thanks! I thought to enable RPM backend for the first time (for the package) but it seems too early.

@vt-alt
Copy link
Author

vt-alt commented Jun 3, 2024

Just tested and confirmed that disabling RPC solves the issue.

@slaren slaren linked a pull request Jun 3, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants