Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan Phi Fix for AMD Proprietary Drivers #5260

Merged
merged 2 commits into from
Feb 1, 2024
Merged

Conversation

0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Feb 1, 2024

Fixes #5243

I also found a CPY buffer size bug that I fixed alongside this. Let me know if it works @stduhpf

@stduhpf
Copy link
Contributor

stduhpf commented Feb 1, 2024

Nice, this seems to work pretty well. I don't get nonsense generations anymore, but there are some "issues" with consistency. Testing with fixed seed and 0 temperature give different outputs depending on the amount of layers offloaded. Not a big deal though.

-ngl 0 and -ngl 1:

Here is a reciepe for tomato soup:

Ingredients:
- 4 cups of chicken broth
- 2 tablespoons of butter
- 1 onion, chopped
- 2 cloves of garlic, minced
- 2 tomatoes, peeled and diced
- Salt and pepper to taste
- Parsley for garnish

Directions:
- In a large pot, melt the butter over medium heat. Add the onion and cook until soft, about 10 minutes.
- Stir in the garlic and cook for another minute.
- Add the tomatoes and chicken broth and bring to a boil. Reduce the heat and simmer for 15 minutes, stirring occasionally.
- Season with salt and pepper

-ngl 2 to -ngl 31 all give different, but always coherent answers

-ngl 32and -ngl 33:

Here is a reciepe for tomato soup:

Ingredients:
- 4 cups of chicken broth
- 2 cans of diced tomatoes
- 1 onion, chopped
- 2 cloves of garlic, minced
- Salt and pepper to taste
- Fresh basil leaves for garnish

Directions:
- In a large pot, bring the chicken broth to a boil over high heat.
- Add the diced tomatoes, onion, and garlic and reduce the heat to medium-low. Simmer for about 15 minutes, stirring occasionally.
- Season with salt and pepper to taste.
- Sprinkle some fresh basil leaves on top of the soup and serve hot or cold.

@0cc4m
Copy link
Collaborator Author

0cc4m commented Feb 1, 2024

Nice, this seems to work pretty well. I don't get nonsense generations anymore, but there are some "issues" with consistency. Testing with fixed seed and 0 temperature give different outputs depending on the amount of layers offloaded. Not a big deal though.

GPU give slightly different results in floating point operations compared to CPU, so even with CUDA or HIP there may be differences. But I also had to build an entire matrix multiplication shader myself since there is no BLAS library for Vulkan, and I suspect there might be some inaccuracies still in there. It's at least mostly correct now and pretty fast.

@stduhpf
Copy link
Contributor

stduhpf commented Feb 1, 2024

GPU give slightly different results in floating point operations compared to CPU, so even with CUDA or HIP there may be differences. But I also had to build an entire matrix multiplication shader myself since there is no BLAS library for Vulkan, and I suspect there might be some inaccuracies still in there. It's at least mostly correct now and pretty fast.

I'm not observing the same level of discrepency with other models... Anyways, as long as the output is coherent, that's good enough for me.

Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the GELU op pass test-backend-ops before? I think the kompute backend had a similar issue.
6fc99a6
38d1f0c

@stduhpf
Copy link
Contributor

stduhpf commented Feb 1, 2024

GPU give slightly different results in floating point operations compared to CPU, so even with CUDA or HIP there may be differences. But I also had to build an entire matrix multiplication shader myself since there is no BLAS library for Vulkan, and I suspect there might be some inaccuracies still in there. It's at least mostly correct now and pretty fast.

Anyways, I just tried with the CLBlast backend, and I get the exact same behaviour as this PR, so it's something about this model specifically that makes it have different outputs between CPU and GPU.

So everything is good!

@0cc4m
Copy link
Collaborator Author

0cc4m commented Feb 1, 2024

Did the GELU op pass test-backend-ops before? I think the kompute backend had a similar issue. 6fc99a6 38d1f0c

It does pass it, yeah. Since Vulkan can run on a whole bunch of devices, the tests can work one some and fail on other devices or drivers, like in this case on proprietary AMD drivers. On the open source AMD drivers it's fine.

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Feb 1, 2024

test-backend-ops is failing on the Vulkan backend for me on AMDVLK (tested on this PR): test-backend-ops-amdvlk.txt
It passes on the Kompute backend.

@0cc4m
Copy link
Collaborator Author

0cc4m commented Feb 1, 2024

test-backend-ops is failing on the Vulkan backend for me on AMDVLK (tested on this PR): test-backend-ops-amdvlk.txt It passes on the Kompute backend.

Yes, that's expected (and not related to amdvlk), but GELU works. I haven't spent time fixing tests yet cause they came up pretty late and I've had my own testing framework by then, which focuses on getting models to work, not general compliance. But I'll get around to it.

@0cc4m 0cc4m merged commit 4d0924a into master Feb 1, 2024
56 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-fixes branch February 1, 2024 18:25
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
* Replace tanh to avoid NaN in gelu shader on AMD proprietary driver

* Fix another Vulkan CPY buffer size bug
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Replace tanh to avoid NaN in gelu shader on AMD proprietary driver

* Fix another Vulkan CPY buffer size bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Phi-2 completely broken on Vulkan
4 participants