Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Makefile #2482

Merged
merged 1 commit into from
Oct 22, 2023
Merged

Update Makefile #2482

merged 1 commit into from
Oct 22, 2023

Conversation

awhill19
Copy link
Contributor

@awhill19 awhill19 commented Aug 1, 2023

#Change:

Use the environment variable CUDA_NATIVE_ARCH if present to set NVCC arch. Otherwise, use native as currently.

#Reasoning:

Running make LLAMA_CUBLAS=1 errors with nvcc fatal : Value 'native' is not defined for option 'gpu-architecture' make: *** [Makefile:249: ggml-cuda.o] Error 1

Making the -arch flag to nvcc configurable allows a specific arch to be chosen, which solves this problem.

Use the environment variable `CUDA_NATIVE_ARCH` if present to set NVCC arch. Otherwise, use `native`.
@slaren
Copy link
Collaborator

slaren commented Aug 4, 2023

Doesn't this do essentially the same as CUDA_DOCKER_ARCH?

@awhill19
Copy link
Contributor Author

awhill19 commented Aug 5, 2023

Doesn't this do essentially the same as CUDA_DOCKER_ARCH?

It does, but CUDA_DOCKER_ARCH also passes -Wno-deprecated-gpu-targets which nvcc seems to have trouble with in some environments. It also just seemed a bit non-intuitive for me to need Docker related environment variables for a simple native installation.

Perhaps just testing whether -Wno-deprecated-gpu-targets is necessary (or warrants a separate arg) would be more appropriate. This fork is just the fix I've been using for my own servers.

@ggerganov ggerganov merged commit 96981f3 into ggerganov:master Oct 22, 2023
@cebtenzzre
Copy link
Collaborator

With this change, CUDA_DOCKER_ARCH now implies -arch=native. I assume that wasn't intended.

@ggerganov
Copy link
Owner

Hm, in what scenario this is wrong?

@cebtenzzre
Copy link
Collaborator

Hm, in what scenario this is wrong?

Before this PR, you could use e.g. CUDA_DOCKER_ARCH=sm_52 to set the NVCC arch, which would evaluate to:

-Wno-deprecated-gpu-targets -arch=sm_52

After this PR, make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_52 ggml-cuda.o shows this in the nvcc command:

-Wno-deprecated-gpu-targets -arch=sm_52 -arch=native

I think the last -arch option will take precedence.

ggerganov added a commit that referenced this pull request Oct 23, 2023
mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Oct 27, 2023
* master: (350 commits)
  speculative : ensure draft and target model vocab matches (ggerganov#3812)
  llama : correctly report GGUFv3 format (ggerganov#3818)
  simple : fix batch handling (ggerganov#3803)
  cuda : improve text-generation and batched decoding performance (ggerganov#3776)
  server : do not release slot on image input (ggerganov#3798)
  batched-bench : print params at start
  log : disable pid in log filenames
  server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768)
  server : do not block system prompt update (ggerganov#3767)
  sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765)
  cmake : add missed dependencies (ggerganov#3763)
  cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749)
  Add more tokenizer tests (ggerganov#3742)
  metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754)
  Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)"
  issues : separate bug and enhancement template + no default title (ggerganov#3748)
  Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746)
  llama : remove token functions with `context` args in favor of `model` (ggerganov#3720)
  Fix baichuan convert script not detecing model (ggerganov#3739)
  make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)
  ...
YuMJie pushed a commit to YuMJie/powerinfer that referenced this pull request Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants