Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper : Metal and ggml-alloc support #1270

Merged
merged 44 commits into from
Sep 15, 2023
Merged
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
fbc3f80
metal : init
ggerganov Sep 10, 2023
949ab63
whisper : factor out graph builds
ggerganov Sep 10, 2023
bed5ad6
whisper : allocate encoder and decoder using ggml-alloc
ggerganov Sep 10, 2023
af6f67b
whisper : ggml-alloc is now supported
ggerganov Sep 10, 2023
fa672b4
whisper : CoreML support ggml-alloc
ggerganov Sep 10, 2023
794e8fe
build : fix ggml-alloc
ggerganov Sep 10, 2023
9a78b72
ios : update submodule
ggerganov Sep 10, 2023
06d1d28
extra : update sync-ggml.sh script to also sync ggml-alloc
ggerganov Sep 10, 2023
4d9acc6
ci : see if this is causing the crash
ggerganov Sep 10, 2023
2770d46
whisper : refactor ggml-alloc init
ggerganov Sep 11, 2023
4845b9e
whisper.android : try to fix build
ggerganov Sep 11, 2023
d3b2dd4
whisper : initial Metal version
ggerganov Sep 11, 2023
de94c78
Merge branch 'master' into metal-and-alloc
ggerganov Sep 12, 2023
3b9979a
ci : try to debug vmem issue
ggerganov Sep 12, 2023
fbc9ddc
metal : decoder works on GPU!
ggerganov Sep 12, 2023
79a8805
metal : add multi-decoder support
ggerganov Sep 12, 2023
9fdd415
ggml : fix ggml_nbytes (probably temp solution)
ggerganov Sep 12, 2023
cd47637
metal : run "cross" step on the GPU
ggerganov Sep 12, 2023
ec9a7db
whisper : remove ggml_repeat in the encoder
ggerganov Sep 12, 2023
3074a7f
whisper : offload the Encoder to Metal
ggerganov Sep 12, 2023
905c944
ggml : use simpler ggml_bytes() implementation
ggerganov Sep 13, 2023
b19888c
ggml-alloc : try to make CI happy by reducing vram to 128GB
ggerganov Sep 13, 2023
254b687
whisper : add whisper_allocr to wrap ggml_allocr
ggerganov Sep 13, 2023
b6f0966
whisper : factor out alloc init in a function
ggerganov Sep 13, 2023
77f4bf4
cmake : update to support Metal build
ggerganov Sep 13, 2023
796f84c
whisper : add <functional> header
ggerganov Sep 13, 2023
181bb8c
objc : fix build (no Metal yet)
ggerganov Sep 13, 2023
257d794
ios : add Metal support
ggerganov Sep 13, 2023
16db4da
swiftui : fix build
ggerganov Sep 13, 2023
8e8daa8
metal : speed-up KQ multiplication
ggerganov Sep 13, 2023
ecb23fb
metal : sync latest llama.cpp kernels
ggerganov Sep 13, 2023
23277d2
readme : add Metal info
ggerganov Sep 13, 2023
d37f56e
ios : update submodule
ggerganov Sep 13, 2023
d863f72
coreml : add code to toggle Core ML config (CPU, ANE, GPU)
ggerganov Sep 13, 2023
f408c64
bench : fix timings by running a pre-heat
ggerganov Sep 13, 2023
e81c67a
bench : start benching the decoder
ggerganov Sep 14, 2023
af947cb
whisper : add ggml_mul_mat_pad
ggerganov Sep 14, 2023
c46167f
bench : fix uninitialized vars
ggerganov Sep 14, 2023
f365543
whisper : add comment for disabling mul-mat padding
ggerganov Sep 14, 2023
2b4160a
whisper : add description of ggml_mul_mat_pad
ggerganov Sep 14, 2023
0d5e4cd
whisper : clean-up ggml_mul_mat_pad
ggerganov Sep 14, 2023
bfcb2a2
metal : remove the "concurrent" flag
ggerganov Sep 14, 2023
a166457
bench : variable n_past
ggerganov Sep 14, 2023
3ac0558
ios : update SPM package
ggerganov Sep 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
whisper : add description of ggml_mul_mat_pad
ggerganov committed Sep 14, 2023

Unverified

This user has not yet uploaded their public signing key.
commit 2b4160af29821c63b70689c0d571dbb1dc140cdd
4 changes: 2 additions & 2 deletions coreml/whisper-encoder.mm
Original file line number Diff line number Diff line change
@@ -24,8 +24,8 @@

// select which device to run the Core ML model on
MLModelConfiguration *config = [[MLModelConfiguration alloc] init];
//config.computeUnits = MLComputeUnitsCPUAndGPU;
config.computeUnits = MLComputeUnitsCPUAndNeuralEngine;
config.computeUnits = MLComputeUnitsCPUAndGPU;
//config.computeUnits = MLComputeUnitsCPUAndNeuralEngine;
//config.computeUnits = MLComputeUnitsAll;

const void * data = CFBridgingRetain([[whisper_encoder_impl alloc] initWithContentsOfURL:url_model configuration:config error:nil]);
13 changes: 13 additions & 0 deletions whisper.cpp
Original file line number Diff line number Diff line change
@@ -136,6 +136,19 @@ static void ggml_graph_compute_helper(std::vector<uint8_t> & buf, ggml_cgraph *
ggml_graph_compute(graph, &plan);
}

// faster matrix multiplications for tensors that do not have dimension 0 divisible by "pad"
// the idea is to represent the original matrix multiplication:
//
// Z = X @ Y
//
// with two matrix multiplications:
//
// Z = [X_0; X_1] @ [Y_0; Y_1]
//
// here X_0 and Y_0 are views of X and Y that have dimension 0 divisible by "pad"
// and X_1 and Y_1 are the remaining views. X_1 and Y_1 end up being small matrices that can be processed with more
// general-purpose kernels
//
static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
//#if !defined(GGML_USE_METAL)
// return ggml_mul_mat(ctx, x, y);