Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: app option to compile without FlashAttention #12025

Merged
merged 1 commit into from
Feb 22, 2025

Conversation

JohannesGaessler
Copy link
Collaborator

Fixes #11946 .

I added an option GGML_CUDA_NO_FA that is used for CUDA, HIP, and MUSA. Two more general questions for compile options:

  • Do we have guidelines regarding whether ON and OFF are relative to a feature being enabled or relative to the default compilation option? Basically, instead of GGML_CUDA_NO_FA=OFF I could have made GGML_CUDA_FA=ON the default.
  • Do we have guidelines regarding whether CUDA compilation options should be used for HIP and MUSA? I noticed that e.g. GGML_CUDA_FORCE_CUBLAS is used for all three but instead of GGML_CUDA_NO_VMM there is e.g. GGML_HIP_NO_VMM.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 22, 2025
@slaren
Copy link
Member

slaren commented Feb 22, 2025

I would prefer to remove the negative from all the option names, and change the default value instead. The negative options were necessary for the Makefile build when the default was to enable the option, but that's no longer a problem with cmake.

@JohannesGaessler
Copy link
Collaborator Author

I changed the CMake option to a positive version but kept the define as a negative. I think it is preferable if the code uses defaults unless the build system adds a define but I don't feel strongly about this and would also be fine with a positive version for the code.

@slaren
Copy link
Member

slaren commented Feb 22, 2025

Regarding the hip/musa/etc option names: I believe it would be worth to have different options for each backend variation, ie. GGML_CUDA_XX, GGML_HIP_XX, etc, because it could allow building multiple backends at the same time with different options, and also for clarity. But there may be other issues preventing that anyway.

@JohannesGaessler JohannesGaessler merged commit a28e0d5 into ggml-org:master Feb 22, 2025
47 checks passed
taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to build CUDA backend without Flash attention
2 participants