Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD GPU Misbehavior w/ some drivers (post GGUF update) #1507

Closed
apage43 opened this issue Oct 13, 2023 · 42 comments
Closed

AMD GPU Misbehavior w/ some drivers (post GGUF update) #1507

apage43 opened this issue Oct 13, 2023 · 42 comments
Assignees
Labels
bug Something isn't working vulkan

Comments

@apage43
Copy link
Member

apage43 commented Oct 13, 2023

System Info

This is specifically tracking issues that still happen after 2.5.0-pre1 which fixes at least some AMD device/driver combos that were reported broken in #1422 - readd them here if they persist after the GGUF update

@apage43 apage43 changed the title AMD GPU Misbehavior w/ some drivers AMD GPU Misbehavior w/ some drivers (post GGUF update) Oct 13, 2023
@apage43 apage43 added the vulkan label Oct 13, 2023
@manyoso
Copy link
Collaborator

manyoso commented Oct 13, 2023

Will be trying to replicate

@mau777pirho
Copy link

imagen
AMD RX 7900XT, driver 23.10.1

@shiloh92
Copy link

Without any changes, now GPT4all crashes with graphics driver issues - RTX gfx

@PedzacyKapec
Copy link

Without any changes, now GPT4all crashes with graphics driver issues - RTX gfx

Always thought Vulcan only works with Nvidia only...

@Dleewee
Copy link

Dleewee commented Oct 19, 2023

Screenshot 2023-10-19 135130

GPU: Radeon RX 6800XT
Drive: PRO 23.Q3

Client 2.4.19:
Results with llama-2-7b-chat.ggmlv3.q4_0: Giving mostly "######" repeating pound/hash sign for answer
Results with orca-mini-7b.ggmlv3.q4_0: Shown in image above, somewhat random output with some real words sprinkled in

Edit: Installed latest consumer driver 23.10.2, giving "######" repeating pound/hash sign on both models mentioned above.

Client 2.5.0-pre1:
Downloaded Mini Orca Small (orca-mini-3b-gguf2-q4_0.gguf) and still seeing the same result (### answers).

Client 2.5.0-pre2:
Application Crash when requesting answer with Mini Orca Small.

@cebtenzzre
Copy link
Member

We received another report of this issue from Kongming on Discord with GPT4All v2.5.1:

  • RX 6600 XT / Windows 11 / Driver Version 23.10.24.03-230824a-395232C-AIB

@mrdevolver
Copy link

Radeon RX Vega 56 / Windows 10 / Driver Version 23.19.02-230831a-396094C-AMD-Software-Adrenalin-Edition (currently the latest available for this model of GPU)

It seems like the GPU is not being used at all. Is it supported at all?

@cebtenzzre
Copy link
Member

It seems like the GPU is not being used at all. Is it supported at all?

It should be supported. Is it available to select in the UI, and does it report use of the device in the bottom-right while generating output? #1425 is for unsupported GPUs. If you see the hashes, your GPU is being used, but running into a GPT4All bug.

@Dleewee
Copy link

Dleewee commented Oct 25, 2023

I can confirm in my case the GPU is definitely being used. By watching task manager I see notable vram and GPU usage when testing a compatible model.

Also, even though the output is gibberish, it is generating much faster on the GPU, like 10x faster.

@manyoso
Copy link
Collaborator

manyoso commented Oct 26, 2023

I can reproduce this with my AMD Radeon (TM) Vega 8 Graphics iGPU running Mini Orca (small) !!!

@manyoso
Copy link
Collaborator

manyoso commented Oct 26, 2023

Turning on validation produces a whole bunch of this:

VUID-vkCmdDispatch-groupCountX-00386(ERROR / SPEC): msgNum: -1903005642 - Validation Error: [ VUID-vkCmdDispatch-groupCountX-00386 ] Object 0: handle = 0x1864b7360c0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x8e927036 | vkCmdDispatch(): groupCountX (155520) exceeds device limit maxComputeWorkGroupCount[0] (65535). The Vulkan spec states: groupCountX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0] (https://vulkan.lunarg.com/doc/view/1.3.261.1/windows/1.3-extensions/vkspec.html#VUID-vkCmdDispatch-groupCountX-00386)

@manyoso
Copy link
Collaborator

manyoso commented Oct 26, 2023

We have several kernels that are exceeding the device limit for workgroup count on my AMD card above ^^^

Specifically, (one of silu,relu,gelu) where we request a workgroup size of 112320 whereas the device limit is 65535 in any one dimension.

The mul kernels also exceeds where we attempt again a workgroup size of 112320

@manyoso
Copy link
Collaborator

manyoso commented Oct 26, 2023

Also running into this validation error:
VUID-VkComputePipelineCreateInfo-layout-07987(ERROR / SPEC): msgNum: -1832049290 - Validation Error: [ VUID-VkComputePipelineCreateInfo-layout-07987 ] Object 0: handle = 0xb3ee8b0000000070, type = VK_OBJECT_TYPE_SHADER_MODULE; Object 1: handle = 0x44695a0000000071, type = VK_OBJECT_TYPE_PIPELINE_LAYOUT; | MessageID = 0x92cd2576 | vkCreateComputePipelines(): pCreateInfos[0] VK_SHADER_STAGE_COMPUTE_BIT has a push constant buffer Block with range [0, 16] which outside the pipeline layout range of [0, 12]. The Vulkan spec states: If a push constant block is declared in a shader, a push constant range in layout must match both the shader stage and range
https://vulkan.lunarg.com/doc/view/1.3.261.1/windows/1.3-extensions/vkspec.html#VUID-VkComputePipelineCreateInfo-layout-07987

@manyoso
Copy link
Collaborator

manyoso commented Oct 26, 2023

The final validation error I'm getting I'm afraid is the real culprit. The problem seems to be on AMD that the driver for some reason will allow us to allocate more memory than the heap can actually supply. It doesn't give any error or indication that the allocation failed.

@manyoso
Copy link
Collaborator

manyoso commented Oct 26, 2023

All three validation errors are now fixed. However, I'm leaving this open until I see someone successfully run AMD Radeon on Windows

@manyoso
Copy link
Collaborator

manyoso commented Oct 26, 2023

This is an offline installer for windows that has all three validation bugs fixed... Need intrepid testers to see if they can successfully get inference on AMD GPU with this build: https://output.circle-artifacts.com/output/job/18f8093e-9e34-4293-b551-478c9163eee4/artifacts/0/build/upload/gpt4all-installer-win64.exe

@birkoffe
Copy link

Radeon RX 6700 + 23.10.2 doesn't help

@Noremacam
Copy link

Noremacam commented Oct 26, 2023

I confirmed the issue still happens for me with the Radeon7900XTX with the build manyoso provided. I however am running the test drivers AMD released for FSR 3, 23.30.01.02.

I'll be happy to help with any further testing.

image

@harish0201
Copy link

harish0201 commented Oct 26, 2023

Gibberish on using Mistral with the Vulkan backend. I'm using a 6800M with Adrenalin 23.10.2 driver set. Not surprising since 6700 and this are the same ISA

Also a really bad question to the other folks here, do you also get a selection box like this:
image

or when you do vulkaninfo, do you get multiple devices? Note that this is a laptop with a gfx90c integrated (A)GPU and a discrete gfx1031 GPU:

vulkaninfo.exe --summary
WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
==========
VULKANINFO
==========

Vulkan Instance Version: 1.3.261


Instance Extensions: count = 13
-------------------------------
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_swapchain_colorspace            : extension revision 4
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_win32_surface                   : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 4
--------------------------
VK_LAYER_AMD_switchable_graphics AMD switchable graphics layer 1.3.262  version 1
VK_LAYER_AMD_switchable_graphics AMD switchable graphics layer 1.3.260  version 1
VK_LAYER_VALVE_steam_fossilize   Steam Pipeline Caching Layer  1.3.207  version 1
VK_LAYER_VALVE_steam_overlay     Steam Overlay Layer           1.3.207  version 1

Devices:
========
GPU0:
        apiVersion         = 1.3.260
        driverVersion      = 2.0.279
        vendorID           = 0x1002
        deviceID           = 0x1638
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = AMD Radeon(TM) Graphics
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.9.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0700-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU1:
        apiVersion         = 1.3.262
        driverVersion      = 2.0.283
        vendorID           = 0x1002
        deviceID           = 0x1638
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = AMD Radeon(TM) Graphics
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.9.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0700-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU2:
        apiVersion         = 1.3.260
        driverVersion      = 2.0.279
        vendorID           = 0x1002
        deviceID           = 0x73df
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 6800M
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.10.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0300-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU3:
        apiVersion         = 1.3.262
        driverVersion      = 2.0.283
        vendorID           = 0x1002
        deviceID           = 0x73df
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 6800M
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 23.10.2 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-0300-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000

@mau777pirho
Copy link

imagen

imagen

@Dleewee
Copy link

Dleewee commented Oct 26, 2023

Gibberish on using Mistral with the Vulkan backend. I'm using a 6800M with Adrenalin 23.10.2 driver set. Not surprising since 6700 and this are the same ISA

Also a really bad question to the other folks here, do you also get a selection box like this:

or when you do vulkaninfo, do you get multiple devices? Note that this is a laptop with a gfx90c integrated (A)GPU and a discrete gfx1031 GPU:

Single GPU shown in "vulkaninfo --summary" output as well as in device drop-down menu.
Testing offline 2.5.2 build on desktop PC with RX6800XT, Windows 10, 23.10.2 driver, Orca Mini model, yields same result as others: "#####"

@Dleewee
Copy link

Dleewee commented Oct 27, 2023

Hi - thank you to the dev(s) looking into this issue for AMD GPU owners. We appreciate your time and efforts.

I notice the issue title mentions "with some drivers" and wonder if there is a specific driver version that is known to work?

@cebtenzzre
Copy link
Member

I notice the issue title mentions "with some drivers" and wonder if there is a specific driver version that is known to work?

RADV on Linux is the configuration that we have been able to get working so far.

@manyoso
Copy link
Collaborator

manyoso commented Oct 27, 2023

FYI, I'm now able to successfully reproduce this issue on AMD Radeon 6800 XT on LINUX with the amdvlk driver and am looking to fix.

@manyoso
Copy link
Collaborator

manyoso commented Oct 27, 2023

FINALLY! it is a synchronization issue. When i define 'record' as 'eval' at the top of ggml-vulkan.cpp I get correct generation, but of course it is too slow. Now we finally have the right clue!!!

@manyoso
Copy link
Collaborator

manyoso commented Oct 27, 2023

index cad334f..dc39cdc 100644
--- a/kompute/src/OpAlgoDispatch.cpp
+++ b/kompute/src/OpAlgoDispatch.cpp
@@ -32,9 +32,9 @@ OpAlgoDispatch::record(const vk::CommandBuffer& commandBuffer)
          this->mAlgorithm->getTensors()) {
         tensor->recordPrimaryBufferMemoryBarrier(
           commandBuffer,
-          vk::AccessFlagBits::eTransferWrite,
+          vk::AccessFlagBits::eShaderWrite,
           vk::AccessFlagBits::eShaderRead,
-          vk::PipelineStageFlagBits::eTransfer,
+          vk::PipelineStageFlagBits::eComputeShader,
           vk::PipelineStageFlagBits::eComputeShader);
     }```

Fixes it but this might affect generation speed...

@manyoso
Copy link
Collaborator

manyoso commented Oct 27, 2023

https://output.circle-artifacts.com/output/job/b7ff15c3-377d-4d27-9dc0-c6503ec5a2b0/artifacts/0/build/upload/gpt4all-installer-win64.exe

Heree is a new offline installer that people can test to see if the recent bugfix resolves the issue. Please let me know!

@manyoso manyoso self-assigned this Oct 27, 2023
@harish0201
Copy link

harish0201 commented Oct 27, 2023

Can confirm this works. I purged the newer drivers that I had installed and reinstalled the older drivers. I'm on 23.8.2 Adrenalin Drivers now, using the build that manyoso posted above on Mistral OpenOrca 7B Q4_0.

This is 6800M laptop, which matches the ISA gfx1031. I get about 25-30 tokens/s

Edit: This doesn't mean I can load Q5_K_M models or larger with 7B/13B using this though. That still works on the CPU

@birkoffe
Copy link

windows 10 + radeon 6700 + adrellian 23.10.2 - works for me now

@Dleewee
Copy link

Dleewee commented Oct 27, 2023

6800XT, Windows 10, 23.10.2 driver, Orca Mini Small.
Using the latest test file it is working to some extent but I still notice some odd behavior. In multiple interactions I am seeing the 1st character of the 1st line printing either a square character, or sometimes just a random word not related to the rest of the output.

Example:
image

The good news is I am seeing a solid 3x speedup over using CPU. Here are some results when running the same query twice:
CPU: 16 tokens/s
GPU: 57 tokens/s
Notable improvement!

Edit: I don't see the "square" or "random word" when I switch to CPU-only generation.

@mau777pirho
Copy link

https://output.circle-artifacts.com/output/job/b7ff15c3-377d-4d27-9dc0-c6503ec5a2b0/artifacts/0/build/upload/gpt4all-installer-win64.exe

Heree is a new offline installer that people can test to see if the recent bugfix resolves the issue. Please let me know!

Está última version funciona correctamente con el modelo "Mistral Instruct".
imagen

But not with the "GPT4All Falcon" model.
imagen

My driver version is 23.10.2.

@manyoso
Copy link
Collaborator

manyoso commented Oct 27, 2023

But not with the "GPT4All Falcon" model. imagen

My driver version is 23.10.2.

Interesting! This is actually observable with our current release 2.5.1 so it would seem this is a different bug. It is not the same as this one because the output is ;;;; and not #### consistently.

Also, the problem with the first characters appears to be a different bug as this occurs with NVIDIA drivers too.

Closing this bug as fixed and opening two new ones for ^^^

@manyoso manyoso closed this as completed Oct 27, 2023
@manyoso
Copy link
Collaborator

manyoso commented Oct 27, 2023

#1580 opened which tracks the first character issues
#1581 opened which tracks the problem with falcon model on amdvlk

@mrdevolver
Copy link

It seems like the GPU is not being used at all. Is it supported at all?

It should be supported. Is it available to select in the UI, and does it report use of the device in the bottom-right while generating output? #1425 is for unsupported GPUs. If you see the hashes, your GPU is being used, but running into a GPT4All bug.

I am able to select the GPU in the list, but it's not being used and reports not enough VRAM when VRAM is actually not being used at all.

@Dleewee
Copy link

Dleewee commented Oct 31, 2023

It seems like the GPU is not being used at all. Is it supported at all?

It should be supported. Is it available to select in the UI, and does it report use of the device in the bottom-right while generating output? #1425 is for unsupported GPUs. If you see the hashes, your GPU is being used, but running into a GPT4All bug.

I am able to select the GPU in the list, but it's not being used and reports not enough VRAM when VRAM is actually not being used at all.

It's all or nothing. You need to choose a smaller model that will fit within you 8 GB vram.

@harish0201
Copy link

harish0201 commented Oct 31, 2023

This is definitely weird, because I wasn't able to load a Q4 model of airoboros-l2-13B, shouldn't a quantized model be possible to load in 12GB VRAM? Does the Vulkan backend use more VRAM compared with either say CLBlast or ROCm?

@Dleewee
Copy link

Dleewee commented Oct 31, 2023

airoboros-l2-13B

Per: https://sych.io/blog/how-to-run-llama-2-locally-a-guide-to-running-your-own-chatgpt-like-large-language-model/

Here's what's generally recommended:

At least 8 GB of RAM is suggested for the 7B models.
At least 16 GB of RAM for the 13B models.
At least 32 GB of RAM for the 70B models.

However, keep in mind, these are general recommendations. If layers are offloaded to the GPU, it will reduce RAM requirements and use VRAM instead. Please check the specific documentation for the model of your choice to ensure a smooth operation.

@harish0201
Copy link

yes, but given gpt4all is using 4bit quantization, that comfortably fits on 12GB VRAM when I use llama.cpp, and uses ~9GB. I've been able to run Q5_K_M quants (uses ~11.2GB) comfortably as well, so this is surprising that gpt4all is probably looking at the number of parameters and mapping those to the memory requirement whilst ignoring the quantization.

I guess this is the reason I asked if Vulkan backend imparts a significant overhead compared to the other operators? (I'm a noob at this, so just trying to understand the differences that the backends make.)

@tilkinsc
Copy link

tilkinsc commented Oct 31, 2023

I have a 16gb vram 6900xt. It appears mini orca (small) works perfectly. I can in theory load all available models into memory. Falcon, which is pretty small will give me the ;;;;;;;;;;;;;;;;;;;;;;;;;;;;; spam. It appears falcon is 4.1gb and orca mini is 1.9gb in file size. Task manager reports tht my dedicate gpu memory doesn't go over 7.1 with falcon loaded and used as it spams ;;;;;;;;;;;;;;.

This issue may need reopened?

@cebtenzzre
Copy link
Member

Falcon, which is pretty small will give me the ;;;;;;;;;;;;;;;;;;;;;;;;;;;;; spam.

You are using GPT4All Falcon on Windows? What version of GPT4All?

@tilkinsc
Copy link

tilkinsc commented Oct 31, 2023

Falcon, which is pretty small will give me the ;;;;;;;;;;;;;;;;;;;;;;;;;;;;; spam.

You are using GPT4All Falcon on Windows? What version of GPT4All?

v2.5.2 windows, the latest available. Since Qt is hell to build.

cebtenzzre pushed a commit to cebtenzzre/llama.cpp that referenced this issue Nov 7, 2023
drivers. Does not have any performance or fidelity effect on other gpu/driver
combos I've tested.

FIXES: nomic-ai/gpt4all#1507
@HyRespt
Copy link

HyRespt commented Dec 20, 2023

Hi, I am still having this issue and also spamming ;;;;;;;;;

I am on latest radeon driver 23.12.1 and latest gpt4all 2.5.4 using gpt4all falcon

Screenshot 2023-12-20 140303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working vulkan
Projects
No open projects
Development

No branches or pull requests