Speculative Decoding for Whisper #1704

sanchit-gandhi · 2023-12-13T15:06:24Z

Blog post and accompanying Google Colab for speculative decoding with the Whisper model.

The blog post provides a more 'in-depth' explanation for speculative decoding, along with some nice animations. The Google Colab is a more streamlined version that can be run end-to-end. Now that we have PT SDPA in Transformers, we can also leverage flash attention to get the reported 2x speed-up on a Google Colab free tier T4 GPU.

_blog.yml

whisper-spec-dec.md

patrickvonplaten

Nice!

pcuenca

Very cool 🔥

pcuenca · 2023-12-18T09:16:49Z

_blog.yml

@@ -3181,3 +3181,13 @@
    - nlp
    - llm
    - transformers
+
+- local: whisper-spec-dec


I wonder if whisper-speculative-decoding would be better for SEO (not sure tbh)

Kept it short as per the instructions here, but agree that the full words are probably better for indexing!

Suggested change

- local: whisper-spec-dec

- local: whisper-speculative-decoding

Maybe @osanseviero has a better knowledge of this :)

I renamed to whisper-speculative-decoding since I agree it'll probably be more visible this way: 08903e4

whisper-spec-dec.md

pcuenca · 2023-12-18T09:54:31Z

whisper-spec-dec.md

+    output, gen_time = generate_with_time(model, inputs, language="nl", task="transcribe")
+    all_time += gen_time
+    predictions.append(processor.batch_decode(output, skip_special_tokens=True, normalize=True)[0])
+    references.append(processor.tokenizer._normalize(sample["normalized_text"]))


Do we need to call a "private" method here?

Opened a PR to make the method public here: huggingface/transformers#28136 (comment)

Otherwise we can instantiate the normalizer separately, but this is a bit more convoluted

whisper-spec-dec.md

pcuenca · 2023-12-18T10:01:52Z

whisper-spec-dec.md

+It is worth noting that the largest speed gains with speculative decoding come with a batch size of 1. For batched 
+speculative decoding, all candidate tokens **across the batch** must match the validation tokens in order for the tokens 
+to be accepted. If a token in the batch at a given position does not agree, all candidate tokens that proceed the position 
+are discarded. Consequently, speculative decoding favours lower batch sizes. In practice, we find that speculative decoding 


Interesting! I fail to visualize why we can't accept irregular sequences, I'll look at the code to get a better understanding.

Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

_blog.yml

sanchit-gandhi requested a review from patrickvonplaten December 13, 2023 16:05

sanchit-gandhi commented Dec 13, 2023

View reviewed changes

_blog.yml Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 15, 2023

View reviewed changes

whisper-spec-dec.md Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 15, 2023

View reviewed changes

whisper-spec-dec.md Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 15, 2023

View reviewed changes

whisper-spec-dec.md Outdated Show resolved Hide resolved

patrickvonplaten approved these changes Dec 15, 2023

View reviewed changes

pcuenca reviewed Dec 18, 2023

View reviewed changes

sanchit-gandhi and others added 15 commits December 19, 2023 15:42

blog

8d10880

sync with colab

fec28cd

add thumbnail

696d06b

add thumbnail image

fd75982

add to yml

8e043dd

fix yml

6a1b234

fixup

a89d858

finish

2d2c23a

typo

b810c89

Apply suggestions from code review

db57fe5

Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

speed-wer trade-off

309e2df

missing wer

eaf6a58

spec-dec -> speculative-decoding

6d7350b

acknowledgements

4b344e2

update date

1e22a48

sanchit-gandhi force-pushed the whisper-speculative-decoding branch from 0c2c9c4 to 1e22a48 Compare December 19, 2023 15:54

sanchit-gandhi commented Dec 20, 2023

View reviewed changes

_blog.yml Outdated Show resolved Hide resolved

Update _blog.yml

23ab8c7

sanchit-gandhi merged commit ab4c7fd into huggingface:main Dec 20, 2023

sanchit-gandhi deleted the whisper-speculative-decoding branch December 20, 2023 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative Decoding for Whisper #1704

Speculative Decoding for Whisper #1704

sanchit-gandhi commented Dec 13, 2023 •

edited

Loading

patrickvonplaten left a comment

pcuenca left a comment

pcuenca Dec 18, 2023

sanchit-gandhi Dec 19, 2023

pcuenca Dec 19, 2023

sanchit-gandhi Dec 19, 2023

pcuenca Dec 18, 2023

sanchit-gandhi Dec 19, 2023

pcuenca Dec 18, 2023

	- local: whisper-spec-dec
	- local: whisper-speculative-decoding

Speculative Decoding for Whisper #1704

Speculative Decoding for Whisper #1704

Conversation

sanchit-gandhi commented Dec 13, 2023 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

pcuenca left a comment

Choose a reason for hiding this comment

pcuenca Dec 18, 2023

Choose a reason for hiding this comment

sanchit-gandhi Dec 19, 2023

Choose a reason for hiding this comment

pcuenca Dec 19, 2023

Choose a reason for hiding this comment

sanchit-gandhi Dec 19, 2023

Choose a reason for hiding this comment

pcuenca Dec 18, 2023

Choose a reason for hiding this comment

sanchit-gandhi Dec 19, 2023

Choose a reason for hiding this comment

pcuenca Dec 18, 2023

Choose a reason for hiding this comment

sanchit-gandhi commented Dec 13, 2023 •

edited

Loading