distil whisper notebook (#1423)

* distil whisper notebook * spelling * grammar
openvinotoolkit · Nov 6, 2023 · 3cebaab · 3cebaab
1 parent 778d25f
commit 3cebaab
Show file tree

Hide file tree

Showing 4 changed files with 1,004 additions and 2 deletions.
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -134,6 +134,8 @@ diarization
 Diffusers
 diffusers
 dimensionality
+Distil
+distil
 DistilBERT
 distilbert
 distiluse
@@ -260,6 +262,7 @@ KServe
 Kubernetes
 Kupyn
 KV
+Labelling
 labour
 labse
 LaBSE

diff --git a/README.md b/README.md
@@ -39,7 +39,8 @@ Check out the latest notebooks that show how to optimize and deploy popular mode
 | [SoftVC VITS Singing Voice Conversion](notebooks/262-softvc-voice-conversion)<br> | SoftVC VITS Singing Voice Conversion and OpenVINO™ | |
 | [Latent Consistency Models: the next generation of Image Generation models ](notebooks/263-latent-consistency-models-image-generation)<br> | Image generation with Latent Consistency Models (LCM) and OpenVINO™ |  <img src=https://user-images.githubusercontent.com/29454499/277367065-13a8f622-8ea7-4d12-b3f8-241d4499305e.png width=300> |
 | [QR Code Monster](notebooks/264-qrcode-monster/)<br> | Generate creative QR codes with ControlNet QR Code Monster and OpenVINO™ | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/76463150/1a5978c6-e7a0-4824-9318-a3d8f4912c47" width=225> |
-| [Würstchen](notebooks/265-wuerstchen-image-generation)<br> | Text-to-image generation with Würstchen and OpenVINO™ | <img src="https://github.com/itrushkin/openvino_notebooks/assets/76161256/6917c558-d74c-4cc9-b81a-679ce0a299ee" width=300> |
+| [Würstchen](notebooks/265-wuerstchen-image-generation)<br> | Text-to-image generation with Würstchen and OpenVINO™ | <img src="https://github.com/itrushkin/openvino_notebooks/assets/76161256/6917c558-d74c-4cc9-b81a-679ce0a299ee" width=300> | |
+| [Distil-Whisper](notebooks/267-distil-whisper-asr)<br> | Automatic speech recognition using Distil-Whisper and OpenVINO™ | | |
 
 ## Table of Contents
 
@@ -205,7 +206,8 @@ Demos that demonstrate inference on a particular model.
 | [263-latent-consistency-models-image-generation](notebooks/263-latent-consistency-models-image-generation)<br> | Image generation with Latent Consistency Models (LCM) and OpenVINO™ |  <img src=https://user-images.githubusercontent.com/29454499/277367065-13a8f622-8ea7-4d12-b3f8-241d4499305e.png width=225> |
 | [264-qrcode-monster](notebooks/264-qrcode-monster/)<br> | Generate creative QR codes with ControlNet QR Code Monster and OpenVINO™ | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/76463150/1a5978c6-e7a0-4824-9318-a3d8f4912c47" width=225> |
 | [265-wuerstchen-image-generation](notebooks/265-wuerstchen-image-generation)<br> | Text-to-image generation with Würstchen and OpenVINO™ | <img src="https://github.com/itrushkin/openvino_notebooks/assets/76161256/6917c558-d74c-4cc9-b81a-679ce0a299ee" width=225> |
-| [266-speculative-sampling](notebooks/266-speculative-sampling)<br> | Text Generation via Speculative Sampling, KV Caching, and OpenVINO™ | <img src=https://user-images.githubusercontent.com/29454499/280659301-49a38beb-e6f3-4a2c-858e-be4ca4491016.png width=225>
+| [266-speculative-sampling](notebooks/266-speculative-sampling)<br> | Text Generation via Speculative Sampling, KV Caching, and OpenVINO™ | <img src=https://user-images.githubusercontent.com/29454499/280659301-49a38beb-e6f3-4a2c-858e-be4ca4491016.png width=225> |
+| [267-distil-whisper-asr](notebooks/267-distil-whisper-asr)<br> | Automatic speech recognition using Distil-Whisper and OpenVINO™ | |
 
 <div id='-model-training'></div>
 

diff --git a/notebooks/267-distil-whisper-asr/267-distil-whisper-asr.ipynb b/notebooks/267-distil-whisper-asr/267-distil-whisper-asr.ipynb
diff --git a/notebooks/267-distil-whisper-asr/README.md b/notebooks/267-distil-whisper-asr/README.md
@@ -0,0 +1,24 @@
+# Automatic speech recognition using Distil-Whisper and OpenVINO
+
+[Distil-Whisper](https://huggingface.co/distil-whisper/distil-large-v2) is a distilled variant of the [Whisper](https://huggingface.co/openai/whisper-large-v2) model by OpenAI proposed in the paper [Robust Knowledge Distillation via Large-Scale Pseudo Labelling](https://arxiv.org/abs/2311.00430). Compared to Whisper, Distil-Whisper runs 6x faster with 50% fewer parameters, while performing to within 1% word error rate (WER) on out-of-distribution evaluation data.
+
+In this tutorial, we consider how to run Distil-Whisper using OpenVINO. We will use the pre-trained model from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library. To simplify the user experience, the [Hugging Face Optimum](https://huggingface.co/docs/optimum) library is used to convert the model to OpenVINO™ IR format.
+
+## Notebook Contents
+
+This notebook demonstrates how to perform automatic speech recognition (ASR) using the Distil-Whisper model and OpenVINO.
+
+The tutorial consists of following steps:
+1. Download PyTorch model
+2. Run PyTorch model inference
+3. Convert and run the model using OpenVINO Integration with HuggingFace Optimum.
+4. Compare the performance of PyTorch and the OpenVINO model.
+5. Use the OpenVINO model with HuggingFace pipelines for long-form audio transcription.
+6. Launch an interactive demo for speech recognition
+
+
+## Installation Instructions
+
+This is a self-contained example that relies solely on its code.</br>
+We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
+For details, please refer to [Installation Guide](../../README.md).