Added options to --numa flag to add finegrained control over execution #5358

bmtwl · 2024-02-06T09:18:34Z

Added 4 options to --numa cli flag

interleave: The current scheme as-is. Execute equally on all available threads on all available nodes
isolate: only execute threads on the current numa node. Will stop cross-node traffic
numactl: inherit the numa environment passed through via the numactl utility. Allows fine-grained execution control
mirror: mirror GGUF to all numa nodes to improve system bandwidth for inference (not implemented, hidden via #ifdefs)

(also added a couple of missing \n to the help text)

ggerganov · 2024-02-06T12:14:39Z

Can you provide some sample commands that you use and the performance results that you observe. This way people can try to reproduce these findings and get a feeling of what improvements we are looking at

ggerganov · 2024-02-06T12:15:29Z

ggml.h

-    GGML_API bool    ggml_is_numa(void); // true if init detected that system has >1 NUMA node
+    GGML_API void       ggml_numa_init(uint32_t numa); // call once for better performance on NUMA systems
+    GGML_API bool       ggml_is_numa(void); // true if init detected that system has >1 NUMA node
+    GGML_API cpu_set_t  ggml_get_numa_affinity(void); // get cpuset from numactl


No need to expose this in the public API. Also remove the <sched.h> header from ggml.h

bmtwl · 2024-02-06T21:48:45Z

Can you provide some sample commands that you use and the performance results that you observe. This way people can try to reproduce these findings and get a feeling of what improvements we are looking at

I don't expect much in the way of large speedups until I start looking at ensuring memory locality, but there are still gains even with just this patch. The main advantage is that we are able to control where the threads execute with a high level of granularity, which may be very useful on larger systems with complicated interconnect structures.
Here is an example run with numactl forcing the patched branched to execute entirely on one numa node vs the unpatched master branch running the same command (with what would be the equivalent of the "--numa interleave" command after patching). Caches were dropped before each run:

numactl -N0 -m0 ./main -m /opt/text-generation-webui/models/miqu-70b-q5/miqu-1-70b.q5_K_M.gguf -p "Hello" -n 32 -t 32 --no-mmap -b 65535 -c 4096 -np 4096 -ns 65535 -cb --numa
numact

llama_print_timings: load time = 21958.00 ms
llama_print_timings: sample time = 4.79 ms / 32 runs ( 0.15 ms per token, 6676.40 tokens per second)
llama_print_timings: prompt eval time = 269.72 ms / 2 tokens ( 134.86 ms per token, 7.42 tokens per second)
llama_print_timings: eval time = 6280.50 ms / 31 runs ( 202.60 ms per token, 4.94 tokens per second)
llama_print_timings: total time = 6564.18 ms / 33 tokens

./main -m /opt/text-generation-webui/models/miqu-70b-q5/miqu-1-70b.q5_K_M.gguf -p "Hello" -n 32 -t 32 --no-mmap -b 65535 -c 4096 -np 4096 -ns 65535 -cb --numa

llama_print_timings: load time = 19808.41 ms
llama_print_timings: sample time = 4.68 ms / 32 runs ( 0.15 ms per token, 6834.69 tokens per second)
llama_print_timings: prompt eval time = 372.62 ms / 2 tokens ( 186.31 ms per token, 5.37 tokens per second)
llama_print_timings: eval time = 8886.55 ms / 31 runs ( 286.66 ms per token, 3.49 tokens per second)
llama_print_timings: total time = 9272.88 ms / 33 tokens

ggerganov reviewed Feb 6, 2024

View reviewed changes

bmtwl closed this Feb 6, 2024

bmtwl force-pushed the master branch from 60b80b0 to 213d143 Compare February 6, 2024 22:34

ggerganov mentioned this pull request Feb 7, 2024

Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h #5377

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added options to --numa flag to add finegrained control over execution #5358

Added options to --numa flag to add finegrained control over execution #5358

bmtwl commented Feb 6, 2024

ggerganov commented Feb 6, 2024

ggerganov Feb 6, 2024

bmtwl commented Feb 6, 2024

Added options to --numa flag to add finegrained control over execution #5358

Added options to --numa flag to add finegrained control over execution #5358

Conversation

bmtwl commented Feb 6, 2024

ggerganov commented Feb 6, 2024

ggerganov Feb 6, 2024

Choose a reason for hiding this comment

bmtwl commented Feb 6, 2024