2024-02-16-distil_asr_whisper_small_en (#14176)

* Add model 2024-02-16-distil_asr_whisper_small_en * Add model 2024-02-25-distil_asr_whisper_medium_en * Add model 2024-02-26-distil_asr_whisper_large_v2_en --------- Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com>
JohnSnowLabs · Feb 26, 2024 · 15085ce · 15085ce
1 parent 3a56387
commit 15085ce
Show file tree

Hide file tree

Showing 3 changed files with 269 additions and 0 deletions.
diff --git a/docs/_posts/ahmedlone127/2024-02-16-distil_asr_whisper_small_en.md b/docs/_posts/ahmedlone127/2024-02-16-distil_asr_whisper_small_en.md
@@ -0,0 +1,92 @@
+---
+layout: model
+title: English distil_asr_whisper_small WhisperForCTC from distil-whisper
+author: John Snow Labs
+name: distil_asr_whisper_small
+date: 2024-02-16
+tags: [en, open_source, onnx]
+task: Automatic Speech Recognition
+language: en
+edition: Spark NLP 5.2.4
+spark_version: 3.0
+supported: true
+engine: onnx
+annotator: WhisperForCTC
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.distil_asr_whisper_small is a English model originally trained by distil-whisper.
+
+This model is only compatible with PySpark 3.4 and above
+
+## Predicted Entities
+
+
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_asr_whisper_small_en_5.2.4_3.0_1708118638184.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_asr_whisper_small_en_5.2.4_3.0_1708118638184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+audioAssembler = AudioAssembler() \
+    .setInputCol("audio_content") \
+    .setOutputCol("audio_assembler")
+
+
+speechToText  = WhisperForCTC.pretrained("distil_asr_whisper_small","en") \
+            .setInputCols(["audio_assembler"]) \
+            .setOutputCol("text")
+
+pipeline = Pipeline().setStages([audioAssembler, speechToText])
+
+pipelineModel = pipeline.fit(data)
+
+pipelineDF = pipelineModel.transform(data)
+```
+```scala
+val audioAssembler = new AudioAssembler() 
+    .setInputCol("audio_content") 
+    .setOutputCol("audio_assembler")
+
+val speechToText  = WhisperForCTC.pretrained("distil_asr_whisper_small","en") 
+            .setInputCols(Array("audio_assembler")) 
+            .setOutputCol("text")
+
+val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
+
+val pipelineModel = pipeline.fit(data)
+
+val pipelineDF = pipelineModel.transform(data)
+
+```
+</div>
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|distil_asr_whisper_small|
+|Compatibility:|Spark NLP 5.2.4+|
+|License:|Open Source|
+|Edition:|Official|
+|Input Labels:|[audio_assembler]|
+|Output Labels:|[text]|
+|Language:|en|
+|Size:|748.5 MB|
+
+## References
+
+https://huggingface.co/distil-whisper/distil-small.en
diff --git a/docs/_posts/ahmedlone127/2024-02-25-distil_asr_whisper_medium_en.md b/docs/_posts/ahmedlone127/2024-02-25-distil_asr_whisper_medium_en.md
@@ -0,0 +1,89 @@
+---
+layout: model
+title: English distil_asr_whisper_mediumWhisperForCTC from distil-whisper
+author: John Snow Labs
+name: distil_asr_whisper_medium
+date: 2024-02-25
+tags: [whisper, en, open_source, onnx]
+task: Automatic Speech Recognition
+language: en
+edition: Spark NLP 5.2.4
+spark_version: 3.4
+supported: true
+engine: onnx
+annotator: WhisperForCTC
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.distil_asr_whisper_medium is a English model originally trained by distil-whisper.
+
+This model is only compatible with PySpark 3.4 and above
+
+## Predicted Entities
+
+
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_asr_whisper_medium_en_5.2.4_3.4_1708901703317.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_asr_whisper_medium_en_5.2.4_3.4_1708901703317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+audioAssembler = AudioAssembler() \
+    .setInputCol("audio_content") \
+    .setOutputCol("audio_assembler")
+
+
+speechToText  = WhisperForCTC.pretrained("distil_asr_whisper_medium","en") \
+            .setInputCols(["audio_assembler"]) \
+            .setOutputCol("text")
+
+pipeline = Pipeline().setStages([audioAssembler, speechToText])
+
+pipelineModel = pipeline.fit(data)
+
+pipelineDF = pipelineModel.transform(data)
+```
+```scala
+val audioAssembler = new AudioAssembler() 
+    .setInputCol("audio_content") 
+    .setOutputCol("audio_assembler")
+
+val speechToText  = WhisperForCTC.pretrained("distil_asr_whisper_medium","en") 
+            .setInputCols(Array("audio_assembler")) 
+            .setOutputCol("text")
+val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
+val pipelineModel = pipeline.fit(data)
+val pipelineDF = pipelineModel.transform(data)
+
+```
+</div>
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|distil_asr_whisper_medium|
+|Compatibility:|Spark NLP 5.2.4+|
+|License:|Open Source|
+|Edition:|Official|
+|Input Labels:|[audio_assembler]|
+|Output Labels:|[text]|
+|Language:|en|
+|Size:|1.4 GB|
+
+## References
+
+https://huggingface.co/distil-whisper/distil-medium.en
diff --git a/docs/_posts/ahmedlone127/2024-02-26-distil_asr_whisper_large_v2_en.md b/docs/_posts/ahmedlone127/2024-02-26-distil_asr_whisper_large_v2_en.md
@@ -0,0 +1,88 @@
+---
+layout: model
+title: English distil_asr_whisper_large_v2 WhisperForCTC from distil-whisper
+author: John Snow Labs
+name: distil_asr_whisper_large_v2
+date: 2024-02-26
+tags: [en, open_source, onnx]
+task: Automatic Speech Recognition
+language: en
+edition: Spark NLP 5.2.4
+spark_version: 3.4
+supported: true
+engine: onnx
+annotator: WhisperForCTC
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.distil_asr_whisper_large_v2 is a English model originally trained by distil-whisper.
+
+This model is only compatible with PySpark 3.4 and above
+
+## Predicted Entities
+
+
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_asr_whisper_large_v2_en_5.2.4_3.4_1708969018025.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_asr_whisper_large_v2_en_5.2.4_3.4_1708969018025.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+audioAssembler = AudioAssembler() \
+    .setInputCol("audio_content") \
+    .setOutputCol("audio_assembler")
+
+
+speechToText  = WhisperForCTC.pretrained("distil_asr_whisper_large_v2","en") \
+            .setInputCols(["audio_assembler"]) \
+            .setOutputCol("text")
+
+pipeline = Pipeline().setStages([audioAssembler, speechToText])
+
+pipelineModel = pipeline.fit(data)
+
+pipelineDF = pipelineModel.transform(data)
+```
+```scala
+val audioAssembler = new AudioAssembler() 
+    .setInputCol("audio_content") 
+    .setOutputCol("audio_assembler")
+
+val speechToText  = WhisperForCTC.pretrained("distil_asr_whisper_large_v2","en") 
+            .setInputCols(Array("audio_assembler")) 
+            .setOutputCol("text")
+val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
+val pipelineModel = pipeline.fit(data)
+val pipelineDF = pipelineModel.transform(data)
+```
+</div>
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|distil_asr_whisper_large_v2|
+|Compatibility:|Spark NLP 5.2.4+|
+|License:|Open Source|
+|Edition:|Official|
+|Input Labels:|[audio_assembler]|
+|Output Labels:|[text]|
+|Language:|en|
+|Size:|2.4 GB|
+
+## References
+
+https://huggingface.co/distil-whisper/distil-large-v2