Bug: GGML assert with bf16, RTX3090 #8234

micsthepick · 2024-07-01T10:42:31Z

What happened?

./llama-server -ngl 99 -cb -c 65536 -np 32 -m models/Phi-3-mini-128k-instruct/ggml-model-bf16.gguf 
...
GGML_ASSERT: ggml/src/ggml-cuda.cu:1257: to_fp32_cuda != nullptr
[New LWP 934430]
[New LWP 934432]
[New LWP 934433]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fb1ba523c7f in __GI___wait4 (pid=934542, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
27      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x00007fb1ba523c7f in __GI___wait4 (pid=934542, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
27      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000559119a6c7eb in ggml_print_backtrace ()
#2  0x000055911992c1b5 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) ()
#3  0x000055911992e781 in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void (*)(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), void (*)(float const*, void*, long, long, long, long, ggml_type, CUstream_st*)) ()
#4  0x000055911992f7a5 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) ()
#5  0x0000559119933cff in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#6  0x0000559119abb4bb in ggml_backend_sched_graph_compute_async ()
#7  0x0000559119b0d7b0 in llama_decode ()
#8  0x0000559119bcd039 in llama_init_from_gpt_params(gpt_params&) ()
#9  0x0000559119c78495 in server_context::load_model(gpt_params const&) ()
#10 0x0000559119913d7a in main ()
[Inferior 1 (process 934429) detached]
./start_phi.sh: line 1: 934429 Aborted

Name and Version

./llama-server --version
version: 3265 (72272b8)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux, Windows

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

bfroemel · 2024-07-01T14:17:34Z

duplicate of #7211

micsthepick added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Jul 1, 2024

micsthepick changed the title ~~Bug: GGML assert~~ Bug: GGML assert with bf16, RTX3090 Jul 1, 2024

micsthepick closed this as completed Jul 1, 2024

micsthepick closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: GGML assert with bf16, RTX3090 #8234

Bug: GGML assert with bf16, RTX3090 #8234

micsthepick commented Jul 1, 2024 •

edited

Loading

bfroemel commented Jul 1, 2024

Bug: GGML assert with bf16, RTX3090 #8234

Bug: GGML assert with bf16, RTX3090 #8234

Comments

micsthepick commented Jul 1, 2024 • edited Loading

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

bfroemel commented Jul 1, 2024

micsthepick commented Jul 1, 2024 •

edited

Loading