-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phi-2 completely broken on Vulkan #5243
Comments
You seem to have a knack for finding issues with the Vulkan code. And I just fixed Phi.. I guess there's another matmul issue with the Windows AMD driver? I'll try to find it. |
Seems to be the GELU shader this time, which AMD's proprietary driver doesn't like. Let me know if anyone spots the likely cause of that NaN. |
For the Metal kernel, we had to explicitly call a more precise Lines 266 to 277 in 1cfb537
Without this change, it was producing NaNs |
Thank you, that's helpful. But I don't think I have any other implementation of tanh available. The GPU driver provides the implementation and it does work on most of them. I'll try to think of a workaround for the proprietary AMD driver. |
I'm not sure how bad this would be for performance (because of branching and all that) and accuracy, but what about using |
Thanks for the suggestion. I found an even better one that seems to work: |
I get garbage output when offloading any layer to GPU when running Phi-2 models with the Vulkan backend. The issue seems to be with the first and last layers mostly.
.\buildVulkan\bin\Release\main.exe -m .\models\phi\phi-2.Q4_K_M.gguf -t 12 -tb 6 -p "Here is a reciepe for tomato soup:\n" -e -s 0 --temp 0 -n 128 -ngl X
(main: build = 2035 (7977a2a0))
-ngl 0
(control)-ngl 1
Starts ok, but glitches after a few tokens generated. (in this case it generated an eos token, so it ended the generation early, but with a different prompt/higer temp, the output is just noisy gibberish)
using `-p "Here is a reciepe for tomato soup:\n\n"`
-ngl 2
(ngl 2 to 32 all produce the same output, only the inference speed changes)
-ngl 32
-ngl 33
(all layers)(always repeating a single token, seems to use mostly '!', 'G' or 'o')
The text was updated successfully, but these errors were encountered: