CUDA: app option to compile without FlashAttention #12025

JohannesGaessler · 2025-02-22T13:29:57Z

Fixes #11946 .

I added an option GGML_CUDA_NO_FA that is used for CUDA, HIP, and MUSA. Two more general questions for compile options:

Do we have guidelines regarding whether ON and OFF are relative to a feature being enabled or relative to the default compilation option? Basically, instead of GGML_CUDA_NO_FA=OFF I could have made GGML_CUDA_FA=ON the default.
Do we have guidelines regarding whether CUDA compilation options should be used for HIP and MUSA? I noticed that e.g. GGML_CUDA_FORCE_CUBLAS is used for all three but instead of GGML_CUDA_NO_VMM there is e.g. GGML_HIP_NO_VMM.

slaren · 2025-02-22T15:35:37Z

I would prefer to remove the negative from all the option names, and change the default value instead. The negative options were necessary for the Makefile build when the default was to enable the option, but that's no longer a problem with cmake.

JohannesGaessler · 2025-02-22T16:30:56Z

I changed the CMake option to a positive version but kept the define as a negative. I think it is preferable if the code uses defaults unless the build system adds a define but I don't feel strongly about this and would also be fine with a positive version for the code.

slaren · 2025-02-22T17:12:27Z

Regarding the hip/musa/etc option names: I believe it would be worth to have different options for each backend variation, ie. GGML_CUDA_XX, GGML_HIP_XX, etc, because it could allow building multiple backends at the same time with different options, and also for clarity. But there may be other issues preventing that anyway.

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 22, 2025

CUDA: app option to compile without FlashAttention

9b2b143

JohannesGaessler force-pushed the cuda-fa-optout branch from 5a9aa69 to 9b2b143 Compare February 22, 2025 16:28

slaren approved these changes Feb 22, 2025

View reviewed changes

JohannesGaessler merged commit a28e0d5 into ggml-org:master Feb 22, 2025
47 checks passed

taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Feb 23, 2025

CUDA: app option to compile without FlashAttention (ggml-org#12025)

db31a06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: app option to compile without FlashAttention #12025

CUDA: app option to compile without FlashAttention #12025

JohannesGaessler commented Feb 22, 2025

slaren commented Feb 22, 2025

JohannesGaessler commented Feb 22, 2025

slaren commented Feb 22, 2025

CUDA: app option to compile without FlashAttention #12025

CUDA: app option to compile without FlashAttention #12025

Conversation

JohannesGaessler commented Feb 22, 2025

slaren commented Feb 22, 2025

JohannesGaessler commented Feb 22, 2025

slaren commented Feb 22, 2025