whisper : try to fix the parallel whisper_state functionality #1479

ggerganov · 2023-11-11T16:38:50Z

./main -m ./models/ggml-base.en.bin -f ./samples/gb0.wav -p 2

CPU
Metal
CUDA

slaren · 2023-11-11T16:56:50Z

I will change ggml-backend to decouple the buffers from backend instances, but that is still going to require changing the metal backend such that the list of buffers is available to all the instances of the metal backend. In the long term, I think it would be better to get rid of the list of buffers entirely by making the ggml_backend_buffer of metal more an abstraction of the metal buffer object (I think it is MTLBuffer). That may also help fix issues that people with intel macs with discrete GPUs have with the metal backend, since it wouldn't assume unified memory.

ggerganov · 2023-11-12T12:28:42Z

In the long term, I think it would be better to get rid of the list of buffers entirely by making the ggml_backend_buffer of metal more an abstraction of the metal buffer object (I think it is MTLBuffer).

Yes, will probably reimplement this when I start working on ggerganov/llama.cpp#3229. Planning to do various improvements in the Metal backend in addition to that.

For now, I've implemented a workaround where in ggml_metal_get_buffer we check if the tensor belongs to a backend buffer and if it does, we search the list of buffers for that backend. Otherwise, we fallback to the old logic. This seems to resolve the issue for now.

* whisper : migrate to ggml-backend * whisper : fix logit reading * whisper : fix tensor allocation during load * whisper : fix beam-search with CUDA * whisper : free backends + fix compile warning * whisper : print when CUDA is enabled * whisper : fix CoreML * make : clean-up * talk : fix compile warning * whisper : support ggml_conv with CUDA and Metal (#1473) * ggml : add CUDA support for ggml_conv * whisper : remove ggml_repeat for conv bias + single backend * cuda : fix im2col kernel * metal : add im2col support + mul mat-vec f16 x f16 * bench-all : add q4 models * whisper : clean-up * quantize-all : fix * ggml : im2col opts * whisper : avoid whisper_model_data wrapper * whisper : add note that ggml_mul_mat_pad does not work with CUDA * whisper : factor out graph compute in common function * whisper : fixes * whisper : fix UB with measure buffers * whisper : try to fix the parallel whisper_state functionality (#1479) * whisper : try to fix the parallel whisper_state functionality * whisper : fix multi-state Metal * whisper : free backend instances in whisper_state

* whisper : migrate to ggml-backend * whisper : fix logit reading * whisper : fix tensor allocation during load * whisper : fix beam-search with CUDA * whisper : free backends + fix compile warning * whisper : print when CUDA is enabled * whisper : fix CoreML * make : clean-up * talk : fix compile warning * whisper : support ggml_conv with CUDA and Metal (ggerganov#1473) * ggml : add CUDA support for ggml_conv * whisper : remove ggml_repeat for conv bias + single backend * cuda : fix im2col kernel * metal : add im2col support + mul mat-vec f16 x f16 * bench-all : add q4 models * whisper : clean-up * quantize-all : fix * ggml : im2col opts * whisper : avoid whisper_model_data wrapper * whisper : add note that ggml_mul_mat_pad does not work with CUDA * whisper : factor out graph compute in common function * whisper : fixes * whisper : fix UB with measure buffers * whisper : try to fix the parallel whisper_state functionality (ggerganov#1479) * whisper : try to fix the parallel whisper_state functionality * whisper : fix multi-state Metal * whisper : free backend instances in whisper_state

whisper : try to fix the parallel whisper_state functionality

d029784

whisper : fix multi-state Metal

76c8b52

ggerganov marked this pull request as ready for review November 12, 2023 12:25

whisper : free backend instances in whisper_state

a2f3b82

ggerganov merged commit 5031f54 into ggml-backend-no-sched Nov 12, 2023
68 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : try to fix the parallel whisper_state functionality #1479

whisper : try to fix the parallel whisper_state functionality #1479

ggerganov commented Nov 11, 2023 •

edited

Loading

slaren commented Nov 11, 2023

ggerganov commented Nov 12, 2023

whisper : try to fix the parallel whisper_state functionality #1479

whisper : try to fix the parallel whisper_state functionality #1479

Conversation

ggerganov commented Nov 11, 2023 • edited Loading

slaren commented Nov 11, 2023

ggerganov commented Nov 12, 2023

ggerganov commented Nov 11, 2023 •

edited

Loading