-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD GPU Misbehavior w/ some drivers (post GGUF update) #1507
Comments
Will be trying to replicate |
Without any changes, now GPT4all crashes with graphics driver issues - RTX gfx |
Always thought Vulcan only works with Nvidia only... |
GPU: Radeon RX 6800XT Client 2.4.19: Edit: Installed latest consumer driver 23.10.2, giving "######" repeating pound/hash sign on both models mentioned above. Client 2.5.0-pre1: Client 2.5.0-pre2: |
We received another report of this issue from Kongming on Discord with GPT4All v2.5.1:
|
Radeon RX Vega 56 / Windows 10 / Driver Version 23.19.02-230831a-396094C-AMD-Software-Adrenalin-Edition (currently the latest available for this model of GPU) It seems like the GPU is not being used at all. Is it supported at all? |
It should be supported. Is it available to select in the UI, and does it report use of the device in the bottom-right while generating output? #1425 is for unsupported GPUs. If you see the hashes, your GPU is being used, but running into a GPT4All bug. |
I can confirm in my case the GPU is definitely being used. By watching task manager I see notable vram and GPU usage when testing a compatible model. Also, even though the output is gibberish, it is generating much faster on the GPU, like 10x faster. |
I can reproduce this with my AMD Radeon (TM) Vega 8 Graphics iGPU running Mini Orca (small) !!! |
Turning on validation produces a whole bunch of this: VUID-vkCmdDispatch-groupCountX-00386(ERROR / SPEC): msgNum: -1903005642 - Validation Error: [ VUID-vkCmdDispatch-groupCountX-00386 ] Object 0: handle = 0x1864b7360c0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x8e927036 | vkCmdDispatch(): groupCountX (155520) exceeds device limit maxComputeWorkGroupCount[0] (65535). The Vulkan spec states: groupCountX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0] (https://vulkan.lunarg.com/doc/view/1.3.261.1/windows/1.3-extensions/vkspec.html#VUID-vkCmdDispatch-groupCountX-00386) |
We have several kernels that are exceeding the device limit for workgroup count on my AMD card above ^^^ Specifically, (one of silu,relu,gelu) where we request a workgroup size of 112320 whereas the device limit is 65535 in any one dimension. The mul kernels also exceeds where we attempt again a workgroup size of 112320 |
Also running into this validation error: |
The final validation error I'm getting I'm afraid is the real culprit. The problem seems to be on AMD that the driver for some reason will allow us to allocate more memory than the heap can actually supply. It doesn't give any error or indication that the allocation failed. |
All three validation errors are now fixed. However, I'm leaving this open until I see someone successfully run AMD Radeon on Windows |
This is an offline installer for windows that has all three validation bugs fixed... Need intrepid testers to see if they can successfully get inference on AMD GPU with this build: https://output.circle-artifacts.com/output/job/18f8093e-9e34-4293-b551-478c9163eee4/artifacts/0/build/upload/gpt4all-installer-win64.exe |
Radeon RX 6700 + 23.10.2 doesn't help |
Single GPU shown in "vulkaninfo --summary" output as well as in device drop-down menu. |
Hi - thank you to the dev(s) looking into this issue for AMD GPU owners. We appreciate your time and efforts. I notice the issue title mentions "with some drivers" and wonder if there is a specific driver version that is known to work? |
RADV on Linux is the configuration that we have been able to get working so far. |
FYI, I'm now able to successfully reproduce this issue on AMD Radeon 6800 XT on LINUX with the amdvlk driver and am looking to fix. |
FINALLY! it is a synchronization issue. When i define 'record' as 'eval' at the top of ggml-vulkan.cpp I get correct generation, but of course it is too slow. Now we finally have the right clue!!! |
index cad334f..dc39cdc 100644
--- a/kompute/src/OpAlgoDispatch.cpp
+++ b/kompute/src/OpAlgoDispatch.cpp
@@ -32,9 +32,9 @@ OpAlgoDispatch::record(const vk::CommandBuffer& commandBuffer)
this->mAlgorithm->getTensors()) {
tensor->recordPrimaryBufferMemoryBarrier(
commandBuffer,
- vk::AccessFlagBits::eTransferWrite,
+ vk::AccessFlagBits::eShaderWrite,
vk::AccessFlagBits::eShaderRead,
- vk::PipelineStageFlagBits::eTransfer,
+ vk::PipelineStageFlagBits::eComputeShader,
vk::PipelineStageFlagBits::eComputeShader);
}```
Fixes it but this might affect generation speed... |
Heree is a new offline installer that people can test to see if the recent bugfix resolves the issue. Please let me know! |
Can confirm this works. I purged the newer drivers that I had installed and reinstalled the older drivers. I'm on 23.8.2 Adrenalin Drivers now, using the build that manyoso posted above on Mistral OpenOrca 7B Q4_0. This is 6800M laptop, which matches the ISA gfx1031. I get about 25-30 tokens/s Edit: This doesn't mean I can load Q5_K_M models or larger with 7B/13B using this though. That still works on the CPU |
windows 10 + radeon 6700 + adrellian 23.10.2 - works for me now |
6800XT, Windows 10, 23.10.2 driver, Orca Mini Small. The good news is I am seeing a solid 3x speedup over using CPU. Here are some results when running the same query twice: Edit: I don't see the "square" or "random word" when I switch to CPU-only generation. |
I am able to select the GPU in the list, but it's not being used and reports not enough VRAM when VRAM is actually not being used at all. |
It's all or nothing. You need to choose a smaller model that will fit within you 8 GB vram. |
This is definitely weird, because I wasn't able to load a Q4 model of airoboros-l2-13B, shouldn't a quantized model be possible to load in 12GB VRAM? Does the Vulkan backend use more VRAM compared with either say CLBlast or ROCm? |
Here's what's generally recommended:
However, keep in mind, these are general recommendations. If layers are offloaded to the GPU, it will reduce RAM requirements and use VRAM instead. Please check the specific documentation for the model of your choice to ensure a smooth operation. |
yes, but given gpt4all is using 4bit quantization, that comfortably fits on 12GB VRAM when I use llama.cpp, and uses ~9GB. I've been able to run Q5_K_M quants (uses ~11.2GB) comfortably as well, so this is surprising that gpt4all is probably looking at the number of parameters and mapping those to the memory requirement whilst ignoring the quantization. I guess this is the reason I asked if Vulkan backend imparts a significant overhead compared to the other operators? (I'm a noob at this, so just trying to understand the differences that the backends make.) |
I have a 16gb vram 6900xt. It appears mini orca (small) works perfectly. I can in theory load all available models into memory. Falcon, which is pretty small will give me the ;;;;;;;;;;;;;;;;;;;;;;;;;;;;; spam. It appears falcon is 4.1gb and orca mini is 1.9gb in file size. Task manager reports tht my dedicate gpu memory doesn't go over 7.1 with falcon loaded and used as it spams ;;;;;;;;;;;;;;. This issue may need reopened? |
You are using GPT4All Falcon on Windows? What version of GPT4All? |
v2.5.2 windows, the latest available. Since Qt is hell to build. |
drivers. Does not have any performance or fidelity effect on other gpu/driver combos I've tested. FIXES: nomic-ai/gpt4all#1507
System Info
This is specifically tracking issues that still happen after 2.5.0-pre1 which fixes at least some AMD device/driver combos that were reported broken in #1422 - readd them here if they persist after the GGUF update
######
reported with 2.5.0-pre1 onThe text was updated successfully, but these errors were encountered: