Skip to content

Commit

Permalink
docs: explain faster CUDA CMake compile [no ci]
Browse files Browse the repository at this point in the history
  • Loading branch information
JohannesGaessler committed Nov 15, 2024
1 parent 09ecbcb commit 6131aea
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -459,14 +459,14 @@ To learn more how to measure perplexity using llama.cpp, [read this documentatio
- Make sure to read this: [Inference at the edge](https://github.com/ggerganov/llama.cpp/discussions/205)
- A bit of backstory for those who are interested: [Changelog podcast](https://changelog.com/podcast/532)

## Other documentations
## Other documentation

- [main (cli)](./examples/main/README.md)
- [server](./examples/server/README.md)
- [jeopardy](./examples/jeopardy/README.md)
- [GBNF grammars](./grammars/README.md)

**Development documentations**
**Development documentation**

- [How to build](./docs/build.md)
- [Running on Docker](./docs/docker.md)
Expand Down
5 changes: 4 additions & 1 deletion docs/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,10 @@ For Jetson user, if you have Jetson Orin, you can try this: [Offical Support](ht
cmake --build build --config Release
```
The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used.
By default llama.cpp will be built for a selection of CUDA architectures that enables running the code on any NVIDIA GPU supported by CUDA 12 (Maxwell or newer).
However, for local use the build can be sped up by narrowing the range of supported CUDA architectures.
By adding `-DCMAKE_CUDA_ARCHITECTURES=native` to the first CMake command the built CUDA architectures can be set to exactly those currently connected to the system.
The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to limit which GPUs are visible (for CUDA in general).
The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted. In Windows this setting is available in the NVIDIA control panel as `System Memory Fallback`.
Expand Down

0 comments on commit 6131aea

Please sign in to comment.