make: add --device-debug to NVCC debug flags #7542

JohannesGaessler · 2024-05-26T10:01:56Z

This PR adds the --device-debug flag to the NVCC compile flags if LLAMA_DEBUG is set. This adds device debugging information but also turns off device code optimization so the compilation is faster.

ggerganov · 2024-05-26T10:56:14Z

This improves the compile time, but it generates the following warnings:

nvcc warning : '--device-debug (-G)' overrides '--generate-line-info (-lineinfo)'
ptxas warning : Conflicting options --device-debug and --generate-line-info specified, ignoring --generate-line-info option

Not sure what are the implications of overriding the -lineinfo option

JohannesGaessler · 2024-05-26T11:10:00Z

-lineinfo essentially attaches a copy of the CUDA source code and allows you to map the binary code to specific source code lines when profiling the code with NVIDIA NSight compute. --device-debug additionally disables optimizations but this makes the information from profiling inaccurate. So I would interpret the warning in that context, that the two options do similar things but are intended for different purposes. For debug builds I think disabling optimizations is the correct things to do; when I profile the code I never use LLAMA_DEBUG anyways but just manually add -lineinfo to the NVCC flags.

slaren · 2024-05-26T11:13:26Z

This needs to be optional. The only case this helps is when looking for bugs on the kernels that are slow to compile, otherwise these files do not need to be recompiled when making changes to other parts of the code. However, this makes the CUDA backend so slow that it will make debug builds unusable for quick iteration in other areas.

GPU	Model	Test	t/s master	t/s cuda-device-debug	Speedup
RTX 3090 Ti	llama 7B Q4_0	pp512	4081.50	618.90	0.15
RTX 3090 Ti	llama 7B Q4_0	tg128	150.52	6.28	0.04

JohannesGaessler · 2024-05-26T11:52:30Z

I thought I had measured a much smaller performance regression when I tested it but I must have done something wrong because upon renewed testing I'm getting very similar results. I added a new flag LLAMA_CUDA_DEBUG instead. There is still a warning if one compiles both with LLAMA_DEBUG and LLAMA_CUDA_DEBUG but I think it's inconsequential.

JohannesGaessler mentioned this pull request May 26, 2024

CUDA: quantized KV support for FA vec #7527

Merged

JohannesGaessler force-pushed the cuda-device-debug branch from f735014 to 88e405e Compare May 26, 2024 11:02

make: add --device-debug to NVCC debug flags

e0b2a40

JohannesGaessler force-pushed the cuda-device-debug branch from 88e405e to e0b2a40 Compare May 26, 2024 11:49

slaren approved these changes May 26, 2024

View reviewed changes

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 27, 2024

JohannesGaessler merged commit 10b1e45 into ggerganov:master May 27, 2024
64 of 65 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make: add --device-debug to NVCC debug flags #7542

make: add --device-debug to NVCC debug flags #7542

JohannesGaessler commented May 26, 2024

ggerganov commented May 26, 2024

JohannesGaessler commented May 26, 2024

slaren commented May 26, 2024

JohannesGaessler commented May 26, 2024 •

edited

Loading

make: add --device-debug to NVCC debug flags #7542

make: add --device-debug to NVCC debug flags #7542

Conversation

JohannesGaessler commented May 26, 2024

ggerganov commented May 26, 2024

JohannesGaessler commented May 26, 2024

slaren commented May 26, 2024

JohannesGaessler commented May 26, 2024 • edited Loading

JohannesGaessler commented May 26, 2024 •

edited

Loading