Make updates to fix issues with clang-cl builds while using AVX512 flags #10314

Srihari-mcw · 2024-11-15T16:58:37Z

The PR contains fixes to build llama.cpp with enhanced AVX512 flags (AVX512_BF16, AVX512_VNNI, AVX512_VBMI) with clang-cl and Visual Studio Generator
Issue : While trying to build with clang-cl and Visual Studio Generator, while trying to enable to AVX512_VBMI, AVX512_VNNI and AVX512_BF16 flags got the following error

Fix : Requires -mavx512vnni, -mavx512vbmi and -mavx512bf16 as part of compile options

Also,

Experiments were conducted to test llama.cpp with MSVC (Visual Studio Generator), Clang (Ninja) and Clang-Cl (Visual Studio Generator) and to compare the performance
Improvements were seen with the usage of Clang-Cl(VS Generator) and Clang (Ninja Generator) over MSVC Compiler
Both the ninja dependencies and clang dependencies were part of the Visual Studio Installations

Visual Studio Generator, Clang-Cl (17.0.3) - For which fixes are given

For building with Visual Studio Generator, Clang-Cl is obtained from the Visual Studio Installations Path : C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\Llvm\x64\bin
CMAKE Command : cmake -G "Visual Studio 17 2022" -T ClangCL -A x64 -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_AVX512_VBMI=ON -DGGML_AVX512_VNNI=ON -DGGML_AVX512_BF16=ON

Ninja Generator, Clang.exe/Clang++.exe (17.0.3)

Clang.exe and clang++.exe are obtained from the same path as clang-cl - C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\Llvm\x64\bin
CMAKE Command : cmake -G "Ninja" -S . -B Windows-build/ -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang.exe -DCMAKE_CXX_COMPILER=clang++.exe
Ninja path : C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja\ninja.exe and version (1.11.0)
Additional library depedencies are resolved with : set LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\um\x64;C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\lib\x64\uwp;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\ucrt\x64 before cmake and build

Both the clang builds are done without the openmp dependencies

Prompt Processing

model	Compiler/Generator	size	params	backend	threads	test	t/s	speedup	Commit id
llama 7B Q4_0	MSVC Compiler/ VS Generator	3.56 GiB	6.74 B	CPU	6	pp 512	48.14 ± 0.67		231f936
llama 7B Q4_0	clang-cl.exe/ VS Generator	3.56 GiB	6.74 B	CPU	6	pp 512	50.23 ± 0.91	4.33%	231f936
llama 7B Q4_0	clang.exe/ Ninja Generator	3.56 GiB	6.74 B	CPU	6	pp 512	55.69 ± 0.71	15.68%	231f936

Text Generation

model	Compiler/Generator	size	params	backend	threads	test	t/s	speedup	Commit id
llama 7B Q4_0	MSVC Compiler/ VS Generator	3.56 GiB	6.74 B	CPU	6	tg 128	14.8 ± 0.01		231f936
llama 7B Q4_0	clang-cl.exe/ VS Generator	3.56 GiB	6.74 B	CPU	6	tg 128	14.95 ± 0.02	1.00%	231f936
llama 7B Q4_0	clang.exe/ Ninja Generator	3.56 GiB	6.74 B	CPU	6	tg 128	14.95 ± 0.02	1.00%	231f936

Tests done with Meta Llama2 7B model

Having a clang windows build (especially with the Ninja Generator) as part of the release could help windows users. Authors/Maintainers of llama.cpp, could you please share your thoughts on the same? Thanks

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggml/src/ggml-cpu/CMakeLists.txt

…ags (ggerganov#10314)

Srihari-mcw · 2024-11-20T01:19:08Z

Having a clang windows build for x86/x64 (especially with the Ninja Generator) as part of the release could help windows users based on the testing above. @slaren, could you please share your thoughts on the same? Thanks

slaren · 2024-11-20T01:22:21Z

Sure, but I wonder if performance would be better with gcc/mingw64.

Srihari-mcw · 2024-12-05T08:15:51Z

Tested the repo with llama.cpp build with w64devkit gcc with make (Pls see table for commit details) and observed text generation is at a disadvantage with the build comparing that to results shared above

GCC 12.2

Prompt Processing

model	Compiler	size	params	backend	threads	test	t/s	Commit id
llama 7B Q4_0	GCC Compiler w64devkit	3.56 GiB	6.74 B	CPU	6	pp 512	57.88 ± 0.89	[`a5e4759`] (`a5e4759`)

Text Generation

model	GCC Compiler w64devkit	size	params	backend	threads	test	t/s	Commit id
llama 7B Q4_0	MSVC Compiler/ VS Generator	3.56 GiB	6.74 B	CPU	6	tg 128	10.31 ± 0.01	[`a5e4759`] (`a5e4759`)

slaren · 2024-12-05T16:46:23Z

I think this is because the OpenMP implementation in MinGW does not perform very well. In my tests, MinGW with -DGGML_OPENMP=OFF performs the best. Additionally, both clang and msvc have issues building all the CPU backend variants due to missing intrinsics, e.g. AVX-VNNI can only be used with gcc. I will be updating the windows CI in the next days.

Srihari-mcw · 2024-12-13T15:52:21Z

Hi @slaren ,

Post the latest changes with Q4_0 model, the model was tested with both clang and GCC (both openmp disabled) and observed that clang, for Q4_0, gives much better performance post these changes (GCC also has gains).

Prompt Processing

model	Compiler	size	params	backend	threads	test	t/s	speedup	Commit id
llama 7B Q4_0	GCC	3.56 GiB	6.74 B	CPU	6	pp 512	63.15 ± 1.07		64ae065
llama 7B Q4_0	Clang	3.56 GiB	6.74 B	CPU	6	pp 512	72.66 ± 0.81	15.06%	64ae065

Text Generation

model	Compiler	size	params	backend	threads	test	t/s	speedup	Commit id
llama 7B Q4_0	GCC	3.56 GiB	6.74 B	CPU	6	tg 128	14.66 ± 0.06		64ae065
llama 7B Q4_0	Clang	3.56 GiB	6.74 B	CPU	6	tg 128	14.62 ± 0.08	-0.27%	64ae065

Currently the build.yml script has provisions for windows build with msvc (Link) Can we also add x64-win-llvm as part of the same? Can you please share your insights on future plans for the CI/CD pipeline? Thanks

slaren · 2024-12-13T15:57:06Z

My goal is to build with GGML_BACKEND_DL and GGML_CPU_ALL_VARIANTS to build the CPU backend and distribute a single package for all the architectures. Unfortunately, clang has issues with some instructions sets, and it cannot complete this build. Are you able to build with clang and -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON?

Srihari-mcw · 2024-12-31T13:04:13Z

Hi @slaren ,

#11027

Pls check out the above PR - With the changes here we were able to build with clang without issues with the options -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON. Additionally we also kept Openmp off while testing the same. Thanks

Make updates to fix issues with clang-cl builds while using AVX512 flags

980d20d

slaren approved these changes Nov 15, 2024

View reviewed changes

ggml/src/ggml-cpu/CMakeLists.txt Outdated Show resolved Hide resolved

ggml/src/ggml-cpu/CMakeLists.txt Outdated Show resolved Hide resolved

ggml/src/ggml-cpu/CMakeLists.txt Outdated Show resolved Hide resolved

Update spacing

4ee5d52

slaren merged commit 74d73dc into ggerganov:master Nov 15, 2024
54 checks passed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

Make updates to fix issues with clang-cl builds while using AVX512 fl…

157e7f3

…ags (ggerganov#10314)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make updates to fix issues with clang-cl builds while using AVX512 flags #10314

Make updates to fix issues with clang-cl builds while using AVX512 flags #10314

Srihari-mcw commented Nov 15, 2024 •

edited

Loading

Srihari-mcw commented Nov 20, 2024

slaren commented Nov 20, 2024

Srihari-mcw commented Dec 5, 2024

slaren commented Dec 5, 2024

Srihari-mcw commented Dec 13, 2024

slaren commented Dec 13, 2024 •

edited

Loading

Srihari-mcw commented Dec 31, 2024

Make updates to fix issues with clang-cl builds while using AVX512 flags #10314

Make updates to fix issues with clang-cl builds while using AVX512 flags #10314

Conversation

Srihari-mcw commented Nov 15, 2024 • edited Loading

Srihari-mcw commented Nov 20, 2024

slaren commented Nov 20, 2024

Srihari-mcw commented Dec 5, 2024

slaren commented Dec 5, 2024

Srihari-mcw commented Dec 13, 2024

slaren commented Dec 13, 2024 • edited Loading

Srihari-mcw commented Dec 31, 2024

Srihari-mcw commented Nov 15, 2024 •

edited

Loading

slaren commented Dec 13, 2024 •

edited

Loading