Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync : ggml #2237

Merged
merged 102 commits into from
Jun 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
7bb8dab
ggml : add `ggml_upscale_ext` (ggml/814)
balisujohn May 15, 2024
215bcb3
Add missing " (llama/7303)
AidanBeltonS May 15, 2024
1c52d7f
ggml : tag ggml_tensor::backend as deprecated (llama/7290)
slaren May 15, 2024
a32324a
Avoid unnecessarily disabling CUDA graphs (llama/7302)
agray3 May 15, 2024
d3f4ab6
ggml : use dynamic thread scheduling for matrix multiplication (llama…
kunnis May 15, 2024
88b9d3b
Add support for properly optimized Windows ARM64 builds with LLVM and…
max-krasnyansky May 16, 2024
f1c281a
rpc : add command line arg for specifying backend memory
rgerganov May 15, 2024
831cf54
ggml : rewrite silu and softmax for cpu (llama/7154)
jart May 17, 2024
b321ba3
ggml-quants, llama : removed excess checks (llama/7274)
GermanAizek May 17, 2024
d64e133
rpc : set SO_REUSEADDR for the server socket (llama/7320)
rgerganov May 17, 2024
4fea7a9
CUDA: faster large batch FA without tensor cores (llama/7314)
JohannesGaessler May 17, 2024
653af39
ggml : fix quants nans when all the group weights are very close to z…
slaren May 18, 2024
449de6a
Update and fix Vulkan soft_max and argsort implementations (llama/7237)
0cc4m May 18, 2024
280208a
cuda : add half2 __shfl_xor() for ROCm 5.5 (llama/7263)
Engininja2 May 18, 2024
e00ace4
CUDA: deduplicate FlashAttention code (llama/7352)
JohannesGaessler May 18, 2024
e211897
android : use "ci-android" branch for CI (llama/7341)
ggerganov May 18, 2024
dfe6b64
Capture CUDA logging output (llama/7298)
fraxy-v May 18, 2024
0d54e78
cuda : clear error after buffer allocation failure (llama/7376)
slaren May 19, 2024
570d7fd
ggml: implement quantized KV cache for FA (llama/7372)
JohannesGaessler May 19, 2024
acd5935
ggml : fix another case of quants nans (llama/7387)
slaren May 19, 2024
9bbf65b
Vulkan Embedding Fix (llama/7360)
0cc4m May 19, 2024
7db2a18
Add provisions for windows support for BF16 code including CMake prov…
Srihari-mcw May 20, 2024
80e2b35
ggml : add loongarch lsx and lasx support (llama/6454)
junchao-loongson May 20, 2024
85bbb06
ggml-opencl, llama: using reserve() if count already known (llama/7272)
GermanAizek May 20, 2024
cc50ea0
Update SYCL upscale operation (llama/7321)
AidanBeltonS May 20, 2024
2668d57
rpc : track allocated buffers (llama/7411)
rgerganov May 20, 2024
ed7eb40
CUDA: deduplicate mmq code (llama/7397)
JohannesGaessler May 21, 2024
aa29372
CUDA: fix unused warning in mmq.cu (llama/7442)
JohannesGaessler May 21, 2024
d2aa1ce
metal : handle F16 inf values, fix FA partial offload (llama/7434)
ggerganov May 21, 2024
1ffabc8
llama : add phi3 128K model support (llama/7225)
liuwei-git May 21, 2024
eca5fb8
cuda : fix rope + add tests (llama/7452)
ggerganov May 22, 2024
4228fb7
CUDA: remove incorrect precision check (llama/7454)
JohannesGaessler May 22, 2024
61d5a1e
cuda : fix compile warning (llama/7454)
ggerganov May 22, 2024
b08c0b0
CUDA: fix FA out-of-bounds writes (llama/7465)
JohannesGaessler May 22, 2024
f366504
CUDA: fix FA out-of-bounds reads (llama/7479)
JohannesGaessler May 22, 2024
a8f67b9
Update vulkan rope implementation to support frequency factors (llama…
0cc4m May 23, 2024
c2be650
ggml : drop support for QK_K=64 (llama/7473)
ggerganov May 23, 2024
1470bad
ggml : remove ggml_flash_attn and ggml_flash_ff (llama/7463)
ggerganov May 23, 2024
22d4b17
ggml : silence UB sanitizer error during iq2_xxs quantization (llama/0)
ggerganov May 23, 2024
024b58e
ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (llama…
msy-kato May 25, 2024
e7b39d8
ggml : restore ggml_rope_xpos_inplace (ggml/0)
ggerganov May 26, 2024
e934ba5
metal : disable FA kernel for HS=256 (llama/7556)
ggerganov May 27, 2024
0055948
metal : add GGML_OP_REPEAT kernels (llama/7557)
ggerganov May 27, 2024
9b0dbe8
Add freq factors (llama/7495)
AidanBeltonS May 27, 2024
b725bb2
Fix q_xxs using mul_mat_q (llama/7459)
AidanBeltonS May 27, 2024
b323cfc
Allow multiple copy function pointers for CUDA graph kernel param upd…
agray3 May 27, 2024
a133206
update HIP_UMA #7399 (llama/7414)
Djip007 May 27, 2024
d6d2508
ggml : generalize GGML_OP_CONCAT (llama/7563)
ggerganov May 28, 2024
023020c
fix ggml_sycl_mul_mat_id() to match the change of api (llama/7436)
arthw May 28, 2024
42a9c95
rpc : resource management rework (llama/7562)
rgerganov May 28, 2024
7cc2ff0
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …
Adriankhl May 28, 2024
9ff003f
sycl : fix assert (llama/7563)
ggerganov May 28, 2024
eeb929a
Align GEMM dispatch (llama/7566)
airMeng May 28, 2024
f9df59a
ggml : fix typo in ggml.c (llama/7603)
zhouwg May 29, 2024
7e95420
examples : adapt to new ggml_concat (ggml/0)
ggerganov May 29, 2024
d53ab4b
ggml : use atomic_flag for critical section (llama/7598)
slaren May 29, 2024
78b74d5
llama-bench : add support for the RPC backend (llama/7435)
rgerganov May 29, 2024
f5de5d7
cuda : non-cont concat support (llama/7610)
ggerganov May 29, 2024
fa6b9ed
ggml : fix YARN + add tests + add asserts (llama/7617)
ggerganov May 29, 2024
7382fec
metal : add missing asserts (llama/7617)
ggerganov May 29, 2024
e3e1a98
metal : remove invalid asserts (llama/7617)
ggerganov May 29, 2024
55de6e0
ggml : fix loongarch build (O2 issue) (llama/7636)
junchao-loongson May 30, 2024
79088fe
faster avx512 exp implementation (llama/7551)
chriselrod May 30, 2024
b79eca7
ggml : fix loongson compile warnings (llama/7537)
ggerganov May 31, 2024
49c5ccb
CUDA: quantized KV support for FA vec (llama/7527)
JohannesGaessler Jun 1, 2024
5758ffa
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681)
JohannesGaessler Jun 1, 2024
bc6158d
Fix FlashAttention debug test, FP32 assert (llama/7684)
JohannesGaessler Jun 1, 2024
5f6620e
fix bug introduced in using calloc (llama/7701)
airlied Jun 2, 2024
f8b7a7f
kompute : implement op_getrows_f32 (llama/6403)
woachk Jun 3, 2024
9e95aa1
Vulkan Mixture of Experts (MoE) support (llama/7628)
0cc4m Jun 3, 2024
784733d
ggml : use OpenMP as a thread pool (llama/7606)
msy-kato Jun 3, 2024
0a6fd4e
llama : offload to RPC in addition to other backends (llama/7640)
rgerganov Jun 3, 2024
1b34416
ggml : prevent builds with -ffinite-math-only (llama/7726)
ggerganov Jun 4, 2024
69982c7
ggml : remove OpenCL (llama/7735)
ggerganov Jun 4, 2024
bf0ff58
Allow number of nodes in CUDA graph to change (llama/7738)
agray3 Jun 4, 2024
809d0f4
ggml : refactor rope norm/neox (llama/7634)
ggerganov Jun 5, 2024
048f479
CUDA: refactor mmq, dmmv, mmvq (llama/7716)
JohannesGaessler Jun 5, 2024
c5f01ea
fix softmax r2r result wrong issue (llama/7811)
pengxin99 Jun 7, 2024
e604adb
vulkan : reuse parent extra for views (llama/7806)
slaren Jun 7, 2024
bb7a50f
CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)
JohannesGaessler Jun 9, 2024
fa0b692
use the correct SYCL context for host USM allocations (llama/7777)
bashbaug Jun 10, 2024
b199187
CUDA: use tensor cores for MMQ (llama/7676)
JohannesGaessler Jun 10, 2024
28c0ccf
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)
JohannesGaessler Jun 11, 2024
b30b2f4
Update Vulkan RoPE implementation (llama/7818)
0cc4m Jun 11, 2024
bfb2212
vulkan: select only one device for single gpu with multiple drivers (…
Adriankhl Jun 11, 2024
035d655
ggml : improve ggml_is_contiguous logic (llama/7856)
ggerganov Jun 12, 2024
3544c18
tests : add non-cont unary tests (llama/7857)
ggerganov Jun 12, 2024
e8f4fa0
CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)
JohannesGaessler Jun 12, 2024
ad6b8d5
move BLAS to a separate backend (llama/6210)
slaren Jun 13, 2024
08078b9
rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)
rgerganov Jun 13, 2024
f8ac7b1
metal : utilize max shared memory for mul_mat_id (llama/7935)
ggerganov Jun 14, 2024
8abc251
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)
JohannesGaessler Jun 14, 2024
8efd6d6
remove global variables (llama/7710)
airMeng Jun 15, 2024
d2744cc
ggml : remove duplicate include of ggml-common.h (ggml/853)
danbev Jun 16, 2024
ce33d6f
ggml : fix and optimize ppc64le (ggml/849)
penghongbo Jun 16, 2024
92dc0b7
sync : ggml
ggerganov Jun 16, 2024
b891050
cmake : fix CUDA build (#0)
ggerganov Jun 16, 2024
16d44bd
talk-llama : sync llama.cpp
ggerganov Jun 16, 2024
c711647
cuda : enable CUDA graphs (#0)
ggerganov Jun 16, 2024
7252394
sycl : sync (#0)
ggerganov Jun 16, 2024
b51ff56
ggml : remove OpenCL (#0)
ggerganov Jun 16, 2024
f5b667d
cmake : fix sycl build (#0)
ggerganov Jun 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 57 additions & 20 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ else()
option(WHISPER_OPENBLAS "whisper: prefer OpenBLAS" OFF)
option(WHISPER_OPENBLAS_INTERFACE64 "whisper: use OpenBLAS w/ 64-bit interface" OFF)
option(WHISPER_CUDA "whisper: support for CUDA" OFF)
option(WHISPER_CUDA_FA_ALL_QUANTS "whisper: compile all quants for FlashAttention" OFF)
option(WHISPER_CUBLAS "whisper: support for CUDA (deprecated)" OFF)
option(WHISPER_HIPBLAS "whisper: support for hipBLAS" OFF)
option(WHISPER_CLBLAST "whisper: use CLBlast" OFF)
Expand Down Expand Up @@ -346,20 +347,53 @@ if (WHISPER_CUBLAS)
endif()

if (WHISPER_CUDA)
cmake_minimum_required(VERSION 3.17)
cmake_minimum_required(VERSION 3.18) # for CMAKE_CUDA_ARCHITECTURES

find_package(CUDAToolkit)

if (CUDAToolkit_FOUND)
message(STATUS "cuBLAS found")

if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
# 52 == lowest CUDA 12 standard
# 60 == f16 CUDA intrinsics
# 61 == integer CUDA intrinsics
# 70 == compute capability at which unrolling a loop in mul_mat_q kernels is faster
if (WHISPER_CUDA_F16 OR WHISPER_CUDA_DMMV_F16)
set(CMAKE_CUDA_ARCHITECTURES "60;61;70") # needed for f16 CUDA intrinsics
else()
set(CMAKE_CUDA_ARCHITECTURES "52;61;70") # lowest CUDA 12 standard + lowest for integer intrinsics
#set(CMAKE_CUDA_ARCHITECTURES "OFF") # use this to compile much faster, but only F16 models work
endif()
endif()
message(STATUS "Using CUDA architectures: ${CMAKE_CUDA_ARCHITECTURES}")

enable_language(CUDA)

file(GLOB GGML_SOURCES_CUDA "ggml-cuda/*.cu")
list(APPEND GGML_SOURCES_CUDA ggml-cuda.h)
list(APPEND GGML_SOURCES_CUDA ggml-cuda.cu)

file(GLOB SRCS "ggml-cuda/template-instances/fattn-wmma*.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
file(GLOB SRCS "ggml-cuda/template-instances/mmq*.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})

if (WHISPER_CUDA_FA_ALL_QUANTS)
file(GLOB SRCS "ggml-cuda/template-instances/fattn-vec*.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
add_compile_definitions(GGML_CUDA_FA_ALL_QUANTS)
else()
file(GLOB SRCS "ggml-cuda/template-instances/fattn-vec*q4_0-q4_0.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
file(GLOB SRCS "ggml-cuda/template-instances/fattn-vec*q8_0-q8_0.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
file(GLOB SRCS "ggml-cuda/template-instances/fattn-vec*f16-f16.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
endif()

add_compile_definitions(GGML_USE_CUDA)
add_compile_definitions(GGML_CUDA_USE_GRAPHS)

if (WHISPER_STATIC)
if (WIN32)
Expand Down Expand Up @@ -399,6 +433,24 @@ if (WHISPER_HIPBLAS)
file(GLOB GGML_SOURCES_ROCM "ggml-cuda/*.cu")
list(APPEND GGML_SOURCES_ROCM "ggml-cuda.cu")

file(GLOB SRCS "ggml-cuda/template-instances/fattn-wmma*.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
file(GLOB SRCS "ggml-cuda/template-instances/mmq*.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})

if (WHISPER_CUDA_FA_ALL_QUANTS)
file(GLOB SRCS "ggml-cuda/template-instances/fattn-vec*.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
add_compile_definitions(GGML_CUDA_FA_ALL_QUANTS)
else()
file(GLOB SRCS "ggml-cuda/template-instances/fattn-vec*q4_0-q4_0.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
file(GLOB SRCS "ggml-cuda/template-instances/fattn-vec*q8_0-q8_0.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
file(GLOB SRCS "ggml-cuda/template-instances/fattn-vec*f16-f16.cu")
list(APPEND GGML_SOURCES_CUDA ${SRCS})
endif()

add_compile_definitions(GGML_USE_HIPBLAS GGML_USE_CUDA)

set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE CXX)
Expand All @@ -411,21 +463,6 @@ if (WHISPER_HIPBLAS)
endif()
endif()

if (WHISPER_CLBLAST)
find_package(CLBlast)
if (CLBlast_FOUND)
message(STATUS "CLBlast found")

set(GGML_SOURCES_OPENCL ggml-opencl.cpp ggml-opencl.h)

add_compile_definitions(GGML_USE_CLBLAST)

set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} clblast)
else()
message(FATAL_ERROR "CLBlast not found")
endif()
endif()

if( WHISPER_OPENVINO )
find_package(OpenVINO REQUIRED COMPONENTS Runtime)
endif()
Expand All @@ -450,7 +487,8 @@ if (WHISPER_SYCL)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsycl -L${MKLROOT}/lib")

set(GGML_HEADERS_SYCL ggml-sycl.h)
set(GGML_SOURCES_SYCL ggml-sycl.cpp)
file(GLOB GGML_SOURCES_SYCL "ggml-sycl/*.cpp")
list(APPEND GGML_SOURCES_SYCL "ggml-sycl.cpp")

set(WHISPER_EXTRA_LIBS ${WHISPER_EXTRA_LIBS} sycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
endif()
Expand Down Expand Up @@ -672,9 +710,8 @@ add_library(${TARGET}
ggml-quants.c
${GGML_SOURCES_METAL}
${GGML_SOURCES_CUDA}
${GGML_SOURCES_OPENCL}
${GGML_SOURCES_SYCL} ${GGML_HEADERS_SYCL}
${GGML_SOURCES_ROCM} ${GGML_HEADERS_ROCM}
${GGML_SOURCES_SYCL} ${GGML_HEADERS_SYCL}
${GGML_SOURCES_ROCM} ${GGML_HEADERS_ROCM}
whisper.h
whisper.cpp
)
Expand Down
33 changes: 16 additions & 17 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,16 @@ ifdef WHISPER_CUBLAS
WHISPER_CUDA := 1
endif

OBJS_CUDA_TEMP_INST = $(patsubst %.cu,%.o,$(wildcard ggml-cuda/template-instances/fattn-wmma*.cu))
OBJS_CUDA_TEMP_INST += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/template-instances/mmq*.cu))
ifdef WHISPER_CUDA_FA_ALL_QUANTS
OBJS_CUDA_TEMP_INST += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/template-instances/fattn-vec*.cu))
else
OBJS_CUDA_TEMP_INST += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/template-instances/fattn-vec*q4_0-q4_0.cu))
OBJS_CUDA_TEMP_INST += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/template-instances/fattn-vec*q8_0-q8_0.cu))
OBJS_CUDA_TEMP_INST += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/template-instances/fattn-vec*f16-f16.cu))
endif # WHISPER_CUDA_FA_ALL_QUANTS

ifdef WHISPER_CUDA
ifeq ($(shell expr $(NVCC_VERSION) \>= 11.6), 1)
CUDA_ARCH_FLAG ?= native
Expand All @@ -285,14 +295,15 @@ ifdef WHISPER_CUDA
endif

CFLAGS += -DGGML_USE_CUDA -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
CXXFLAGS += -DGGML_USE_CUDA -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
CXXFLAGS += -DGGML_USE_CUDA -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include -DGGML_CUDA_USE_GRAPHS
LDFLAGS += -lcuda -lcublas -lculibos -lcudart -lcublasLt -lcufft -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib -L/usr/lib/wsl/lib
WHISPER_OBJ += ggml-cuda.o whisper-mel-cuda.o
WHISPER_OBJ += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/*.cu))
WHISPER_OBJ += $(OBJS_CUDA_TEMP_INST)
NVCC = nvcc
NVCCFLAGS = --forward-unknown-to-host-compiler -arch=$(CUDA_ARCH_FLAG)

ggml-cuda/%.o: ggml-cuda/%.cu ggml-cuda/%.cuh ggml.h ggml-common.h ggml-cuda/common.cuh
ggml-cuda/%.o: ggml-cuda/%.cu ggml.h ggml-common.h ggml-cuda/common.cuh
$(NVCC) $(NVCCFLAGS) $(CXXFLAGS) -c $< -o $@

ggml-cuda.o: ggml-cuda.cu ggml-cuda.h ggml.h ggml-backend.h ggml-backend-impl.h ggml-common.h $(wildcard ggml-cuda/*.cuh)
Expand All @@ -313,6 +324,7 @@ ifdef WHISPER_HIPBLAS
HIPFLAGS += $(addprefix --offload-arch=,$(GPU_TARGETS))
WHISPER_OBJ += ggml-cuda.o
WHISPER_OBJ += $(patsubst %.cu,%.o,$(wildcard ggml-cuda/*.cu))
WHISPER_OBJ += $(OBJS_CUDA_TEMP_INST)

ggml-cuda/%.o: ggml-cuda/%.cu ggml-cuda/%.cuh ggml.h ggml-common.h ggml-cuda/common.cuh
$(HIPCC) $(CXXFLAGS) $(HIPFLAGS) -x hip -c -o $@ $<
Expand All @@ -321,21 +333,6 @@ ggml-cuda.o: ggml-cuda.cu ggml-cuda.h ggml.h ggml-backend.h ggml-backend-impl.h
$(HIPCC) $(CXXFLAGS) $(HIPFLAGS) -x hip -c -o $@ $<
endif

ifdef WHISPER_CLBLAST
CFLAGS += -DGGML_USE_CLBLAST
CXXFLAGS += -DGGML_USE_CLBLAST
LDFLAGS += -lclblast
ifeq ($(UNAME_S),Darwin)
LDFLAGS += -framework OpenCL
else
LDFLAGS += -lOpenCL
endif
WHISPER_OBJ += ggml-opencl.o

ggml-opencl.o: ggml-opencl.cpp ggml-opencl.h
$(CXX) $(CXXFLAGS) -c $< -o $@
endif

ifdef WHISPER_GPROF
CFLAGS += -pg
CXXFLAGS += -pg
Expand Down Expand Up @@ -457,6 +454,8 @@ libwhisper.so: $(WHISPER_OBJ)

clean:
rm -f *.o main stream command talk talk-llama bench quantize server lsp libwhisper.a libwhisper.so
rm -vrf ggml-cuda/*.o
rm -vrf ggml-cuda/template-instances/*.o

#
# Examples
Expand Down
23 changes: 0 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
- Zero memory allocations at runtime
- Support for CPU-only inference
- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)

Expand Down Expand Up @@ -422,28 +421,6 @@ make clean
WHISPER_CUDA=1 make -j
```

## OpenCL GPU support via CLBlast

For cards and integrated GPUs that support OpenCL, the Encoder processing can be largely offloaded to the GPU through CLBlast. This is especially useful for users with AMD APUs or low end devices for up to ~2x speedup.

First, make sure you have installed `CLBlast` for your OS or Distribution: https://github.com/CNugteren/CLBlast

Now build `whisper.cpp` with CLBlast support:

```
Makefile:
cd whisper.cpp
make clean
WHISPER_CLBLAST=1 make -j

CMake:
cd whisper.cpp
cmake -B build -DWHISPER_CLBLAST=ON
cmake --build build -j --config Release
```

Run all the examples as usual.

## BLAS CPU support via OpenBLAS

Encoder processing can be accelerated on the CPU via OpenBLAS.
Expand Down
Loading
Loading