-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSeek Models (V2/V3) Hang with ROCm Backend #11141
Comments
Does it run with CPU-only on this system? Attach the output of: GGML_SCHED_DEBUG=2 ./bin/llama-eval-callback -m /models/DeepSeek-V3-Q2_K_L-00001-of-00005.gguf -ngl 48 -n 1 -lv 1 |
Here is the output for the ./bin/llama-eval-callback command: Command used: GGML_SCHED_DEBUG=2 ./llama-eval-callback -m /models/DeepSeek-V3-Q2_K_L-00001-of-00005.gguf -ngl 48 -n 1 -lv 1 Does it run with CPU-only on this system?
|
And also the output of: GGML_SCHED_DEBUG=2 ./llama-eval-callback -m /models/DeepSeek-V3-Q2_K_L-00001-of-00005.gguf -ngl 48 -n 1 -lv 1 --prompt '<|User|>why is the sky blue?<|Assistant|>' It's likely that this command will hang, but still provide the obtained logs up to the hang. |
Witnessing same behavior using CUDA, works fine on commit EDIT: Not sure what's going on, I went back to latest commit and now it's working again. I really don't know what's happening, running EDIT3: So I decided to reboot once more and run it again but this time let it sit there, and it actually eventually started to respond and below are the timings. It took almost 8 minutes before the first token appeared.
Every subsequent runs appear to run faster (same identical cli command):
|
Here is the output for the ./llama-eval-callback command: The command exhibits the same behavior, hanging with one of the GPUs pegged at 100%. Command used:
|
@emuchogu If you let it run for some minutes like @dranger003 did, does it eventually continue? This might be an extreme case of the issue discussed in #11005. |
I ran the command for 30 minutes and observed same behavior. Attached is the log for the 30-minute run. |
Name and Version
./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 8 ROCm devices:
Device 0: AMD Instinct MI100, compute capability 9.0, VMM: no
Device 1: AMD Instinct MI100, compute capability 9.0, VMM: no
Device 2: AMD Instinct MI100, compute capability 9.0, VMM: no
Device 3: AMD Instinct MI100, compute capability 9.0, VMM: no
Device 4: AMD Instinct MI100, compute capability 9.0, VMM: no
Device 5: AMD Instinct MI100, compute capability 9.0, VMM: no
Device 6: AMD Instinct MI100, compute capability 9.0, VMM: no
Device 7: AMD Instinct MI100, compute capability 9.0, VMM: no
version: 4436 (53ff6b9)
built with Ubuntu clang version 12.0.1-19ubuntu3 for x86_64-pc-linux-gnu
Operating systems
Linux
GGML backends
HIP
Hardware
AMD Instinct MI100
Models
DeepSeek-V2
DeepSeek-V3
Problem description & steps to reproduce
Description
When attempting to run DeepSeek models (V2 or V3) using the ROCm backend, the models load successfully into VRAM but fail to generate any output. One GPU becomes pinned at 100% utilization while the others remain idle.
Commands Used
DeepSeek V2
./llama-cli -m /models/DeepSeek-V2-Chat-0628-Q4_K_M-00001-of-00004.gguf -ngl 999 --prompt '<|User|>why is the sky blue?<|Assistant|>'
DeepSeek V3
./llama-cli -m /models/DeepSeek-V3-Q2_K_L-00001-of-00005.gguf -ngl 48 --prompt '<|User|>why is the sky blue?<|Assistant|>'
Observed Behavior
Steps to Reproduce
Additional Notes
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: