Vulkan Phi Fix for AMD Proprietary Drivers #5260

0cc4m · 2024-02-01T16:26:53Z

I also found a CPY buffer size bug that I fixed alongside this. Let me know if it works @stduhpf

stduhpf · 2024-02-01T16:41:33Z

Nice, this seems to work pretty well. I don't get nonsense generations anymore, but there are some "issues" with consistency. Testing with fixed seed and 0 temperature give different outputs depending on the amount of layers offloaded. Not a big deal though.

`-ngl 0` and `-ngl 1`:

Here is a reciepe for tomato soup:

Ingredients:
- 4 cups of chicken broth
- 2 tablespoons of butter
- 1 onion, chopped
- 2 cloves of garlic, minced
- 2 tomatoes, peeled and diced
- Salt and pepper to taste
- Parsley for garnish

Directions:
- In a large pot, melt the butter over medium heat. Add the onion and cook until soft, about 10 minutes.
- Stir in the garlic and cook for another minute.
- Add the tomatoes and chicken broth and bring to a boil. Reduce the heat and simmer for 15 minutes, stirring occasionally.
- Season with salt and pepper

`-ngl 2` to `-ngl 31` all give different, but always coherent answers

`-ngl 32`and `-ngl 33`:

Here is a reciepe for tomato soup:

Ingredients:
- 4 cups of chicken broth
- 2 cans of diced tomatoes
- 1 onion, chopped
- 2 cloves of garlic, minced
- Salt and pepper to taste
- Fresh basil leaves for garnish

Directions:
- In a large pot, bring the chicken broth to a boil over high heat.
- Add the diced tomatoes, onion, and garlic and reduce the heat to medium-low. Simmer for about 15 minutes, stirring occasionally.
- Season with salt and pepper to taste.
- Sprinkle some fresh basil leaves on top of the soup and serve hot or cold.

0cc4m · 2024-02-01T16:54:58Z

Nice, this seems to work pretty well. I don't get nonsense generations anymore, but there are some "issues" with consistency. Testing with fixed seed and 0 temperature give different outputs depending on the amount of layers offloaded. Not a big deal though.

GPU give slightly different results in floating point operations compared to CPU, so even with CUDA or HIP there may be differences. But I also had to build an entire matrix multiplication shader myself since there is no BLAS library for Vulkan, and I suspect there might be some inaccuracies still in there. It's at least mostly correct now and pretty fast.

stduhpf · 2024-02-01T16:57:56Z

GPU give slightly different results in floating point operations compared to CPU, so even with CUDA or HIP there may be differences. But I also had to build an entire matrix multiplication shader myself since there is no BLAS library for Vulkan, and I suspect there might be some inaccuracies still in there. It's at least mostly correct now and pretty fast.

I'm not observing the same level of discrepency with other models... Anyways, as long as the output is coherent, that's good enough for me.

slaren

Did the GELU op pass test-backend-ops before? I think the kompute backend had a similar issue.
6fc99a6
38d1f0c

stduhpf · 2024-02-01T17:01:25Z

GPU give slightly different results in floating point operations compared to CPU, so even with CUDA or HIP there may be differences. But I also had to build an entire matrix multiplication shader myself since there is no BLAS library for Vulkan, and I suspect there might be some inaccuracies still in there. It's at least mostly correct now and pretty fast.

Anyways, I just tried with the CLBlast backend, and I get the exact same behaviour as this PR, so it's something about this model specifically that makes it have different outputs between CPU and GPU.

So everything is good!

0cc4m · 2024-02-01T17:10:22Z

Did the GELU op pass test-backend-ops before? I think the kompute backend had a similar issue. 6fc99a6 38d1f0c

It does pass it, yeah. Since Vulkan can run on a whole bunch of devices, the tests can work one some and fail on other devices or drivers, like in this case on proprietary AMD drivers. On the open source AMD drivers it's fine.

cebtenzzre · 2024-02-01T17:27:24Z

test-backend-ops is failing on the Vulkan backend for me on AMDVLK (tested on this PR): test-backend-ops-amdvlk.txt
It passes on the Kompute backend.

0cc4m · 2024-02-01T18:24:38Z

test-backend-ops is failing on the Vulkan backend for me on AMDVLK (tested on this PR): test-backend-ops-amdvlk.txt It passes on the Kompute backend.

Yes, that's expected (and not related to amdvlk), but GELU works. I haven't spent time fixing tests yet cause they came up pretty late and I've had my own testing framework by then, which focuses on getting models to work, not general compliance. But I'll get around to it.

* Replace tanh to avoid NaN in gelu shader on AMD proprietary driver * Fix another Vulkan CPY buffer size bug

0cc4m added 2 commits February 1, 2024 17:12

Replace tanh to avoid NaN in gelu shader on AMD proprietary driver

e76d001

Fix another Vulkan CPY buffer size bug

23e35e9

slaren approved these changes Feb 1, 2024

View reviewed changes

0cc4m merged commit 4d0924a into master Feb 1, 2024
56 checks passed

0cc4m deleted the 0cc4m/vulkan-fixes branch February 1, 2024 18:25

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024

Vulkan Phi Fix for AMD Proprietary Drivers (ggerganov#5260)

a14967b

* Replace tanh to avoid NaN in gelu shader on AMD proprietary driver * Fix another Vulkan CPY buffer size bug

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

Vulkan Phi Fix for AMD Proprietary Drivers (ggerganov#5260)

aa84567

* Replace tanh to avoid NaN in gelu shader on AMD proprietary driver * Fix another Vulkan CPY buffer size bug

0cc4m mentioned this pull request Dec 8, 2024

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows #10723

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan Phi Fix for AMD Proprietary Drivers #5260

Vulkan Phi Fix for AMD Proprietary Drivers #5260

0cc4m commented Feb 1, 2024

stduhpf commented Feb 1, 2024 •

edited

Loading

0cc4m commented Feb 1, 2024

stduhpf commented Feb 1, 2024 •

edited

Loading

slaren left a comment

stduhpf commented Feb 1, 2024 •

edited

Loading

0cc4m commented Feb 1, 2024

cebtenzzre commented Feb 1, 2024 •

edited

Loading

0cc4m commented Feb 1, 2024

Vulkan Phi Fix for AMD Proprietary Drivers #5260

Vulkan Phi Fix for AMD Proprietary Drivers #5260

Conversation

0cc4m commented Feb 1, 2024

stduhpf commented Feb 1, 2024 • edited Loading

-ngl 0 and -ngl 1:

-ngl 2 to -ngl 31 all give different, but always coherent answers

-ngl 32and -ngl 33:

0cc4m commented Feb 1, 2024

stduhpf commented Feb 1, 2024 • edited Loading

slaren left a comment

Choose a reason for hiding this comment

stduhpf commented Feb 1, 2024 • edited Loading

0cc4m commented Feb 1, 2024

cebtenzzre commented Feb 1, 2024 • edited Loading

0cc4m commented Feb 1, 2024

stduhpf commented Feb 1, 2024 •

edited

Loading

`-ngl 0` and `-ngl 1`:

`-ngl 2` to `-ngl 31` all give different, but always coherent answers

`-ngl 32`and `-ngl 33`:

stduhpf commented Feb 1, 2024 •

edited

Loading

stduhpf commented Feb 1, 2024 •

edited

Loading

cebtenzzre commented Feb 1, 2024 •

edited

Loading