Skip to content

Commit

Permalink
distil whisper notebook (#1423)
Browse files Browse the repository at this point in the history
* distil whisper notebook

* spelling

* grammar
  • Loading branch information
eaidova authored Nov 6, 2023
1 parent 778d25f commit 3cebaab
Show file tree
Hide file tree
Showing 4 changed files with 1,004 additions and 2 deletions.
3 changes: 3 additions & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ diarization
Diffusers
diffusers
dimensionality
Distil
distil
DistilBERT
distilbert
distiluse
Expand Down Expand Up @@ -260,6 +262,7 @@ KServe
Kubernetes
Kupyn
KV
Labelling
labour
labse
LaBSE
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ Check out the latest notebooks that show how to optimize and deploy popular mode
| [SoftVC VITS Singing Voice Conversion](notebooks/262-softvc-voice-conversion)<br> | SoftVC VITS Singing Voice Conversion and OpenVINO™ | |
| [Latent Consistency Models: the next generation of Image Generation models ](notebooks/263-latent-consistency-models-image-generation)<br> | Image generation with Latent Consistency Models (LCM) and OpenVINO™ | <img src=https://user-images.githubusercontent.com/29454499/277367065-13a8f622-8ea7-4d12-b3f8-241d4499305e.png width=300> |
| [QR Code Monster](notebooks/264-qrcode-monster/)<br> | Generate creative QR codes with ControlNet QR Code Monster and OpenVINO™ | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/76463150/1a5978c6-e7a0-4824-9318-a3d8f4912c47" width=225> |
| [Würstchen](notebooks/265-wuerstchen-image-generation)<br> | Text-to-image generation with Würstchen and OpenVINO™ | <img src="https://github.com/itrushkin/openvino_notebooks/assets/76161256/6917c558-d74c-4cc9-b81a-679ce0a299ee" width=300> |
| [Würstchen](notebooks/265-wuerstchen-image-generation)<br> | Text-to-image generation with Würstchen and OpenVINO™ | <img src="https://github.com/itrushkin/openvino_notebooks/assets/76161256/6917c558-d74c-4cc9-b81a-679ce0a299ee" width=300> | |
| [Distil-Whisper](notebooks/267-distil-whisper-asr)<br> | Automatic speech recognition using Distil-Whisper and OpenVINO™ | | |

## Table of Contents

Expand Down Expand Up @@ -205,7 +206,8 @@ Demos that demonstrate inference on a particular model.
| [263-latent-consistency-models-image-generation](notebooks/263-latent-consistency-models-image-generation)<br> | Image generation with Latent Consistency Models (LCM) and OpenVINO™ | <img src=https://user-images.githubusercontent.com/29454499/277367065-13a8f622-8ea7-4d12-b3f8-241d4499305e.png width=225> |
| [264-qrcode-monster](notebooks/264-qrcode-monster/)<br> | Generate creative QR codes with ControlNet QR Code Monster and OpenVINO™ | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/76463150/1a5978c6-e7a0-4824-9318-a3d8f4912c47" width=225> |
| [265-wuerstchen-image-generation](notebooks/265-wuerstchen-image-generation)<br> | Text-to-image generation with Würstchen and OpenVINO™ | <img src="https://github.com/itrushkin/openvino_notebooks/assets/76161256/6917c558-d74c-4cc9-b81a-679ce0a299ee" width=225> |
| [266-speculative-sampling](notebooks/266-speculative-sampling)<br> | Text Generation via Speculative Sampling, KV Caching, and OpenVINO™ | <img src=https://user-images.githubusercontent.com/29454499/280659301-49a38beb-e6f3-4a2c-858e-be4ca4491016.png width=225>
| [266-speculative-sampling](notebooks/266-speculative-sampling)<br> | Text Generation via Speculative Sampling, KV Caching, and OpenVINO™ | <img src=https://user-images.githubusercontent.com/29454499/280659301-49a38beb-e6f3-4a2c-858e-be4ca4491016.png width=225> |
| [267-distil-whisper-asr](notebooks/267-distil-whisper-asr)<br> | Automatic speech recognition using Distil-Whisper and OpenVINO™ | |

<div id='-model-training'></div>

Expand Down
973 changes: 973 additions & 0 deletions notebooks/267-distil-whisper-asr/267-distil-whisper-asr.ipynb

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions notebooks/267-distil-whisper-asr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Automatic speech recognition using Distil-Whisper and OpenVINO

[Distil-Whisper](https://huggingface.co/distil-whisper/distil-large-v2) is a distilled variant of the [Whisper](https://huggingface.co/openai/whisper-large-v2) model by OpenAI proposed in the paper [Robust Knowledge Distillation via Large-Scale Pseudo Labelling](https://arxiv.org/abs/2311.00430). Compared to Whisper, Distil-Whisper runs 6x faster with 50% fewer parameters, while performing to within 1% word error rate (WER) on out-of-distribution evaluation data.

In this tutorial, we consider how to run Distil-Whisper using OpenVINO. We will use the pre-trained model from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library. To simplify the user experience, the [Hugging Face Optimum](https://huggingface.co/docs/optimum) library is used to convert the model to OpenVINO™ IR format.

## Notebook Contents

This notebook demonstrates how to perform automatic speech recognition (ASR) using the Distil-Whisper model and OpenVINO.

The tutorial consists of following steps:
1. Download PyTorch model
2. Run PyTorch model inference
3. Convert and run the model using OpenVINO Integration with HuggingFace Optimum.
4. Compare the performance of PyTorch and the OpenVINO model.
5. Use the OpenVINO model with HuggingFace pipelines for long-form audio transcription.
6. Launch an interactive demo for speech recognition


## Installation Instructions

This is a self-contained example that relies solely on its code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).

0 comments on commit 3cebaab

Please sign in to comment.