Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speculative Decoding for Whisper #1704

Merged

Conversation

sanchit-gandhi
Copy link
Contributor

@sanchit-gandhi sanchit-gandhi commented Dec 13, 2023

Blog post and accompanying Google Colab for speculative decoding with the Whisper model.

The blog post provides a more 'in-depth' explanation for speculative decoding, along with some nice animations. The Google Colab is a more streamlined version that can be run end-to-end. Now that we have PT SDPA in Transformers, we can also leverage flash attention to get the reported 2x speed-up on a Google Colab free tier T4 GPU.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool 🔥

_blog.yml Outdated
@@ -3181,3 +3181,13 @@
- nlp
- llm
- transformers

- local: whisper-spec-dec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if whisper-speculative-decoding would be better for SEO (not sure tbh)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept it short as per the instructions here, but agree that the full words are probably better for indexing!

Suggested change
- local: whisper-spec-dec
- local: whisper-speculative-decoding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @osanseviero has a better knowledge of this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to whisper-speculative-decoding since I agree it'll probably be more visible this way: 08903e4

output, gen_time = generate_with_time(model, inputs, language="nl", task="transcribe")
all_time += gen_time
predictions.append(processor.batch_decode(output, skip_special_tokens=True, normalize=True)[0])
references.append(processor.tokenizer._normalize(sample["normalized_text"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call a "private" method here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a PR to make the method public here: huggingface/transformers#28136 (comment)

Otherwise we can instantiate the normalizer separately, but this is a bit more convoluted

Comment on lines 451 to 454
It is worth noting that the largest speed gains with speculative decoding come with a batch size of 1. For batched
speculative decoding, all candidate tokens **across the batch** must match the validation tokens in order for the tokens
to be accepted. If a token in the batch at a given position does not agree, all candidate tokens that proceed the position
are discarded. Consequently, speculative decoding favours lower batch sizes. In practice, we find that speculative decoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I fail to visualize why we can't accept irregular sequences, I'll look at the code to get a better understanding.

@sanchit-gandhi sanchit-gandhi force-pushed the whisper-speculative-decoding branch from 0c2c9c4 to 1e22a48 Compare December 19, 2023 15:54
@sanchit-gandhi sanchit-gandhi merged commit ab4c7fd into huggingface:main Dec 20, 2023
@sanchit-gandhi sanchit-gandhi deleted the whisper-speculative-decoding branch December 20, 2023 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants