docs: explain faster CUDA CMake compile [no ci]

ggerganov · Nov 15, 2024 · 6131aea · 6131aea
1 parent 09ecbcb
commit 6131aea
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -459,14 +459,14 @@ To learn more how to measure perplexity using llama.cpp, [read this documentatio
 - Make sure to read this: [Inference at the edge](https://github.com/ggerganov/llama.cpp/discussions/205)
 - A bit of backstory for those who are interested: [Changelog podcast](https://changelog.com/podcast/532)
 
-## Other documentations
+## Other documentation
 
 - [main (cli)](./examples/main/README.md)
 - [server](./examples/server/README.md)
 - [jeopardy](./examples/jeopardy/README.md)
 - [GBNF grammars](./grammars/README.md)
 
-**Development documentations**
+**Development documentation**
 
 - [How to build](./docs/build.md)
 - [Running on Docker](./docs/docker.md)

diff --git a/docs/build.md b/docs/build.md
@@ -178,7 +178,10 @@ For Jetson user, if you have Jetson Orin, you can try this: [Offical Support](ht
   cmake --build build --config Release
   ```
 
-The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used.
+By default llama.cpp will be built for a selection of CUDA architectures that enables running the code on any NVIDIA GPU supported by CUDA 12 (Maxwell or newer).
+However, for local use the build can be sped up by narrowing the range of supported CUDA architectures.
+By adding `-DCMAKE_CUDA_ARCHITECTURES=native` to the first CMake command the built CUDA architectures can be set to exactly those currently connected to the system.
+The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to limit which GPUs are visible (for CUDA in general).
 
 The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted. In Windows this setting is available in the NVIDIA control panel as `System Memory Fallback`.