-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: implement __hmax and __hmax2 for CUDA < 11.7 #7019
CUDA: implement __hmax and __hmax2 for CUDA < 11.7 #7019
Conversation
Ah, I actually just did an awful hack to do the same thing and got it to work, and was about to PR it. Let me test yours though, as it seems like a cleaner and better impl. |
@JohannesGaessler unfortunately, your version fails to compile with this error:
also
|
bc8ac98
to
24ea3c6
Compare
I think the issue was the |
Alright, let me give it another test. |
Hmm, no it's still not working for me. Now I am getting a very weird
You can view my build environment and CI error logs here (Sorry, I know this is kobold, but it's compiling the same cuda files): |
24ea3c6
to
859734e
Compare
I've changed it to be similar to how you originally did it. The performance for old CUDA versions will be potentially worse but if people want to use such an old CUDA version my stance is that they'll just have to live with it. |
Sorry, I only noticed after I had already pushed the button, but does the code actually work for you @LostRuins ? |
Yes, it does. I have tried with both flash attn on and off, and the model outputs are coherent - so I presume it must be working. Let me merge your latest changes and try again. |
@JohannesGaessler I can confirm that your changes now build successfully and appear to work. Thanks! |
This PR implements
__hmax
and__hmax2
for CUDA < 11.7. I don't know how well they perform relative to the built-in functions but without them FlashAttention will not work at all.