-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Makefile #2482
Update Makefile #2482
Conversation
Use the environment variable `CUDA_NATIVE_ARCH` if present to set NVCC arch. Otherwise, use `native`.
Doesn't this do essentially the same as |
It does, but Perhaps just testing whether |
With this change, CUDA_DOCKER_ARCH now implies |
Hm, in what scenario this is wrong? |
Before this PR, you could use e.g. CUDA_DOCKER_ARCH=sm_52 to set the NVCC arch, which would evaluate to:
After this PR,
I think the last -arch option will take precedence. |
This reverts commit 96981f3. See: #2482 (comment)
* master: (350 commits) speculative : ensure draft and target model vocab matches (ggerganov#3812) llama : correctly report GGUFv3 format (ggerganov#3818) simple : fix batch handling (ggerganov#3803) cuda : improve text-generation and batched decoding performance (ggerganov#3776) server : do not release slot on image input (ggerganov#3798) batched-bench : print params at start log : disable pid in log filenames server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768) server : do not block system prompt update (ggerganov#3767) sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765) cmake : add missed dependencies (ggerganov#3763) cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749) Add more tokenizer tests (ggerganov#3742) metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754) Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)" issues : separate bug and enhancement template + no default title (ggerganov#3748) Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746) llama : remove token functions with `context` args in favor of `model` (ggerganov#3720) Fix baichuan convert script not detecing model (ggerganov#3739) make : add optional CUDA_NATIVE_ARCH (ggerganov#2482) ...
This reverts commit 96981f3. See: ggerganov/llama.cpp#2482 (comment)
#Change:
Use the environment variable
CUDA_NATIVE_ARCH
if present to set NVCC arch. Otherwise, usenative
as currently.#Reasoning:
Running
make LLAMA_CUBLAS=1
errors withnvcc fatal : Value 'native' is not defined for option 'gpu-architecture' make: *** [Makefile:249: ggml-cuda.o] Error 1
Making the
-arch
flag to nvcc configurable allows a specific arch to be chosen, which solves this problem.