diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_personal_whisper_small_english_model_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_personal_whisper_small_english_model_en.md new file mode 100644 index 00000000000000..58c915b4449e4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_personal_whisper_small_english_model_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_personal_whisper_small_english_model WhisperForCTC from fractalego +author: John Snow Labs +name: asr_personal_whisper_small_english_model +date: 2023-10-19 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_personal_whisper_small_english_model` is a English model originally trained by fractalego. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_personal_whisper_small_english_model_en_5.1.4_3.4_1697754481302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_personal_whisper_small_english_model_en_5.1.4_3.4_1697754481302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_personal_whisper_small_english_model","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_personal_whisper_small_english_model","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_personal_whisper_small_english_model| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/fractalego/personal-whisper-small.en-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_personal_whisper_small_english_model_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_personal_whisper_small_english_model_pipeline_en.md new file mode 100644 index 00000000000000..c188644156cd69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_personal_whisper_small_english_model_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_personal_whisper_small_english_model_pipeline pipeline WhisperForCTC from fractalego +author: John Snow Labs +name: asr_personal_whisper_small_english_model_pipeline +date: 2023-10-19 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_personal_whisper_small_english_model_pipeline` is a English model originally trained by fractalego. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_personal_whisper_small_english_model_pipeline_en_5.1.4_3.4_1697754518503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_personal_whisper_small_english_model_pipeline_en_5.1.4_3.4_1697754518503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_personal_whisper_small_english_model_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_personal_whisper_small_english_model_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_personal_whisper_small_english_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/fractalego/personal-whisper-small.en-model + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_lithuanian_finetune_lt.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_lithuanian_finetune_lt.md new file mode 100644 index 00000000000000..4d011976883339 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_lithuanian_finetune_lt.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Lithuanian asr_whisper_lithuanian_finetune WhisperForCTC from daniel-rdt +author: John Snow Labs +name: asr_whisper_lithuanian_finetune +date: 2023-10-19 +tags: [whisper, lt, open_source, asr, onnx] +task: Automatic Speech Recognition +language: lt +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_lithuanian_finetune` is a Lithuanian model originally trained by daniel-rdt. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_lithuanian_finetune_lt_5.1.4_3.4_1697755801160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_lithuanian_finetune_lt_5.1.4_3.4_1697755801160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_lithuanian_finetune","lt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_lithuanian_finetune","lt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_lithuanian_finetune| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|lt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/daniel-rdt/whisper-lt-finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_lithuanian_finetune_pipeline_lt.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_lithuanian_finetune_pipeline_lt.md new file mode 100644 index 00000000000000..203e29eaa84f11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_lithuanian_finetune_pipeline_lt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Lithuanian asr_whisper_lithuanian_finetune_pipeline pipeline WhisperForCTC from daniel-rdt +author: John Snow Labs +name: asr_whisper_lithuanian_finetune_pipeline +date: 2023-10-19 +tags: [whisper, lt, open_source, pipeline] +task: Automatic Speech Recognition +language: lt +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_lithuanian_finetune_pipeline` is a Lithuanian model originally trained by daniel-rdt. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_lithuanian_finetune_pipeline_lt_5.1.4_3.4_1697755826126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_lithuanian_finetune_pipeline_lt_5.1.4_3.4_1697755826126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_lithuanian_finetune_pipeline', lang = 'lt') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_lithuanian_finetune_pipeline', lang = 'lt') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_lithuanian_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|lt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/daniel-rdt/whisper-lt-finetune + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_malayalam_first_model_ml.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_malayalam_first_model_ml.md new file mode 100644 index 00000000000000..a75c8818aa3776 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_malayalam_first_model_ml.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Malayalam asr_whisper_malayalam_first_model WhisperForCTC from kurianbenoy +author: John Snow Labs +name: asr_whisper_malayalam_first_model +date: 2023-10-19 +tags: [whisper, ml, open_source, asr, onnx] +task: Automatic Speech Recognition +language: ml +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_malayalam_first_model` is a Malayalam model originally trained by kurianbenoy. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_malayalam_first_model_ml_5.1.4_3.4_1697755379079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_malayalam_first_model_ml_5.1.4_3.4_1697755379079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_malayalam_first_model","ml") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_malayalam_first_model","ml") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_malayalam_first_model| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ml| +|Size:|391.1 MB| + +## References + +https://huggingface.co/kurianbenoy/whisper-ml-first-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_malayalam_first_model_pipeline_ml.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_malayalam_first_model_pipeline_ml.md new file mode 100644 index 00000000000000..b698a24a4214b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_malayalam_first_model_pipeline_ml.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Malayalam asr_whisper_malayalam_first_model_pipeline pipeline WhisperForCTC from kurianbenoy +author: John Snow Labs +name: asr_whisper_malayalam_first_model_pipeline +date: 2023-10-19 +tags: [whisper, ml, open_source, pipeline] +task: Automatic Speech Recognition +language: ml +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_malayalam_first_model_pipeline` is a Malayalam model originally trained by kurianbenoy. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_malayalam_first_model_pipeline_ml_5.1.4_3.4_1697755389149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_malayalam_first_model_pipeline_ml_5.1.4_3.4_1697755389149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_malayalam_first_model_pipeline', lang = 'ml') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_malayalam_first_model_pipeline', lang = 'ml') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_malayalam_first_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|ml| +|Size:|391.1 MB| + +## References + +https://huggingface.co/kurianbenoy/whisper-ml-first-model + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bak_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bak_en.md new file mode 100644 index 00000000000000..63a5a76517dd1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bak_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_bak WhisperForCTC from AigizK +author: John Snow Labs +name: asr_whisper_small_bak +date: 2023-10-19 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_bak` is a English model originally trained by AigizK. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_bak_en_5.1.4_3.4_1697753335079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_bak_en_5.1.4_3.4_1697753335079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_bak","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_bak","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_bak| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AigizK/whisper-small-bak \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bak_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bak_pipeline_en.md new file mode 100644 index 00000000000000..16bed74424c586 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bak_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_bak_pipeline pipeline WhisperForCTC from AigizK +author: John Snow Labs +name: asr_whisper_small_bak_pipeline +date: 2023-10-19 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_bak_pipeline` is a English model originally trained by AigizK. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_bak_pipeline_en_5.1.4_3.4_1697753365157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_bak_pipeline_en_5.1.4_3.4_1697753365157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_bak_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_bak_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_bak_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AigizK/whisper-small-bak + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bengali_subhadeep_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bengali_subhadeep_en.md new file mode 100644 index 00000000000000..5d1382f034df6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bengali_subhadeep_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_bengali_subhadeep WhisperForCTC from Subhadeep +author: John Snow Labs +name: asr_whisper_small_bengali_subhadeep +date: 2023-10-19 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_bengali_subhadeep` is a English model originally trained by Subhadeep. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_bengali_subhadeep_en_5.1.4_3.4_1697757153165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_bengali_subhadeep_en_5.1.4_3.4_1697757153165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_bengali_subhadeep","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_bengali_subhadeep","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_bengali_subhadeep| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Subhadeep/whisper-small-bn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bengali_subhadeep_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bengali_subhadeep_pipeline_en.md new file mode 100644 index 00000000000000..adcefbfeb3e49a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_bengali_subhadeep_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_bengali_subhadeep_pipeline pipeline WhisperForCTC from Subhadeep +author: John Snow Labs +name: asr_whisper_small_bengali_subhadeep_pipeline +date: 2023-10-19 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_bengali_subhadeep_pipeline` is a English model originally trained by Subhadeep. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_bengali_subhadeep_pipeline_en_5.1.4_3.4_1697757181862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_bengali_subhadeep_pipeline_en_5.1.4_3.4_1697757181862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_bengali_subhadeep_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_bengali_subhadeep_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_bengali_subhadeep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Subhadeep/whisper-small-bn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinese_tw_voidful_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinese_tw_voidful_en.md new file mode 100644 index 00000000000000..44deac6deb72fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinese_tw_voidful_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_chinese_tw_voidful WhisperForCTC from voidful +author: John Snow Labs +name: asr_whisper_small_chinese_tw_voidful +date: 2023-10-19 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_chinese_tw_voidful` is a English model originally trained by voidful. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_chinese_tw_voidful_en_5.1.4_3.4_1697753255068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_chinese_tw_voidful_en_5.1.4_3.4_1697753255068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_chinese_tw_voidful","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_chinese_tw_voidful","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_chinese_tw_voidful| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/voidful/whisper-small-zh-TW \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinese_tw_voidful_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinese_tw_voidful_pipeline_en.md new file mode 100644 index 00000000000000..796c65d9089569 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinese_tw_voidful_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_chinese_tw_voidful_pipeline pipeline WhisperForCTC from voidful +author: John Snow Labs +name: asr_whisper_small_chinese_tw_voidful_pipeline +date: 2023-10-19 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_chinese_tw_voidful_pipeline` is a English model originally trained by voidful. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_chinese_tw_voidful_pipeline_en_5.1.4_3.4_1697753279228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_chinese_tw_voidful_pipeline_en_5.1.4_3.4_1697753279228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_chinese_tw_voidful_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_chinese_tw_voidful_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_chinese_tw_voidful_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/voidful/whisper-small-zh-TW + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinesebasetw_pipeline_zh.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinesebasetw_pipeline_zh.md new file mode 100644 index 00000000000000..5327aa27c093ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinesebasetw_pipeline_zh.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Chinese asr_whisper_small_chinesebasetw_pipeline pipeline WhisperForCTC from Jingmiao +author: John Snow Labs +name: asr_whisper_small_chinesebasetw_pipeline +date: 2023-10-19 +tags: [whisper, zh, open_source, pipeline] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_chinesebasetw_pipeline` is a Chinese model originally trained by Jingmiao. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_chinesebasetw_pipeline_zh_5.1.4_3.4_1697757472081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_chinesebasetw_pipeline_zh_5.1.4_3.4_1697757472081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_chinesebasetw_pipeline', lang = 'zh') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_chinesebasetw_pipeline', lang = 'zh') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_chinesebasetw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jingmiao/whisper-small-chineseBaseTW + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinesebasetw_zh.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinesebasetw_zh.md new file mode 100644 index 00000000000000..530f36f0286351 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_chinesebasetw_zh.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Chinese asr_whisper_small_chinesebasetw WhisperForCTC from Jingmiao +author: John Snow Labs +name: asr_whisper_small_chinesebasetw +date: 2023-10-19 +tags: [whisper, zh, open_source, asr, onnx] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_chinesebasetw` is a Chinese model originally trained by Jingmiao. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_chinesebasetw_zh_5.1.4_3.4_1697757430683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_chinesebasetw_zh_5.1.4_3.4_1697757430683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_chinesebasetw","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_chinesebasetw","zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_chinesebasetw| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jingmiao/whisper-small-chineseBaseTW \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_hindi_xinhuang_hi.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_hindi_xinhuang_hi.md new file mode 100644 index 00000000000000..3d80e7aa6073eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_hindi_xinhuang_hi.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Hindi asr_whisper_small_hindi_xinhuang WhisperForCTC from xinhuang +author: John Snow Labs +name: asr_whisper_small_hindi_xinhuang +date: 2023-10-19 +tags: [whisper, hi, open_source, asr, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_hindi_xinhuang` is a Hindi model originally trained by xinhuang. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_hindi_xinhuang_hi_5.1.4_3.4_1697754908782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_hindi_xinhuang_hi_5.1.4_3.4_1697754908782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_hindi_xinhuang","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_hindi_xinhuang","hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_hindi_xinhuang| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xinhuang/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_hindi_xinhuang_pipeline_hi.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_hindi_xinhuang_pipeline_hi.md new file mode 100644 index 00000000000000..ece7023482e98c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_hindi_xinhuang_pipeline_hi.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Hindi asr_whisper_small_hindi_xinhuang_pipeline pipeline WhisperForCTC from xinhuang +author: John Snow Labs +name: asr_whisper_small_hindi_xinhuang_pipeline +date: 2023-10-19 +tags: [whisper, hi, open_source, pipeline] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_hindi_xinhuang_pipeline` is a Hindi model originally trained by xinhuang. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_hindi_xinhuang_pipeline_hi_5.1.4_3.4_1697754944761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_hindi_xinhuang_pipeline_hi_5.1.4_3.4_1697754944761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_hindi_xinhuang_pipeline', lang = 'hi') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_hindi_xinhuang_pipeline', lang = 'hi') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_hindi_xinhuang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xinhuang/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_lithuanian_deividasm_lt.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_lithuanian_deividasm_lt.md new file mode 100644 index 00000000000000..cc23fee11f736b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_lithuanian_deividasm_lt.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Lithuanian asr_whisper_small_lithuanian_deividasm WhisperForCTC from DeividasM +author: John Snow Labs +name: asr_whisper_small_lithuanian_deividasm +date: 2023-10-19 +tags: [whisper, lt, open_source, asr, onnx] +task: Automatic Speech Recognition +language: lt +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_lithuanian_deividasm` is a Lithuanian model originally trained by DeividasM. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_lithuanian_deividasm_lt_5.1.4_3.4_1697755670450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_lithuanian_deividasm_lt_5.1.4_3.4_1697755670450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_lithuanian_deividasm","lt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_lithuanian_deividasm","lt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_lithuanian_deividasm| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|lt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DeividasM/whisper-small-lt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_lithuanian_deividasm_pipeline_lt.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_lithuanian_deividasm_pipeline_lt.md new file mode 100644 index 00000000000000..6540d1773e6067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_lithuanian_deividasm_pipeline_lt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Lithuanian asr_whisper_small_lithuanian_deividasm_pipeline pipeline WhisperForCTC from DeividasM +author: John Snow Labs +name: asr_whisper_small_lithuanian_deividasm_pipeline +date: 2023-10-19 +tags: [whisper, lt, open_source, pipeline] +task: Automatic Speech Recognition +language: lt +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_lithuanian_deividasm_pipeline` is a Lithuanian model originally trained by DeividasM. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_lithuanian_deividasm_pipeline_lt_5.1.4_3.4_1697755699167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_lithuanian_deividasm_pipeline_lt_5.1.4_3.4_1697755699167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_lithuanian_deividasm_pipeline', lang = 'lt') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_lithuanian_deividasm_pipeline', lang = 'lt') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_lithuanian_deividasm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|lt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DeividasM/whisper-small-lt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_nepali_np_ne.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_nepali_np_ne.md new file mode 100644 index 00000000000000..0a640afe7afa6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_nepali_np_ne.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Nepali (macrolanguage) asr_whisper_small_nepali_np WhisperForCTC from julie200 +author: John Snow Labs +name: asr_whisper_small_nepali_np +date: 2023-10-19 +tags: [whisper, ne, open_source, asr, onnx] +task: Automatic Speech Recognition +language: ne +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_nepali_np` is a Nepali (macrolanguage) model originally trained by julie200. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_nepali_np_ne_5.1.4_3.4_1697758238849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_nepali_np_ne_5.1.4_3.4_1697758238849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_nepali_np","ne") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_nepali_np","ne") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_nepali_np| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ne| +|Size:|1.7 GB| + +## References + +https://huggingface.co/julie200/whisper-small-ne-np \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_nepali_np_pipeline_ne.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_nepali_np_pipeline_ne.md new file mode 100644 index 00000000000000..12a86f64b2e774 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_nepali_np_pipeline_ne.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Nepali (macrolanguage) asr_whisper_small_nepali_np_pipeline pipeline WhisperForCTC from julie200 +author: John Snow Labs +name: asr_whisper_small_nepali_np_pipeline +date: 2023-10-19 +tags: [whisper, ne, open_source, pipeline] +task: Automatic Speech Recognition +language: ne +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_nepali_np_pipeline` is a Nepali (macrolanguage) model originally trained by julie200. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_nepali_np_pipeline_ne_5.1.4_3.4_1697758262511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_nepali_np_pipeline_ne_5.1.4_3.4_1697758262511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_nepali_np_pipeline', lang = 'ne') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_nepali_np_pipeline', lang = 'ne') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_nepali_np_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|ne| +|Size:|1.7 GB| + +## References + +https://huggingface.co/julie200/whisper-small-ne-np + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_polish_aspik101_pipeline_pl.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_polish_aspik101_pipeline_pl.md new file mode 100644 index 00000000000000..21b32eed65c247 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_polish_aspik101_pipeline_pl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Polish asr_whisper_small_polish_aspik101_pipeline pipeline WhisperForCTC from Aspik101 +author: John Snow Labs +name: asr_whisper_small_polish_aspik101_pipeline +date: 2023-10-19 +tags: [whisper, pl, open_source, pipeline] +task: Automatic Speech Recognition +language: pl +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_polish_aspik101_pipeline` is a Polish model originally trained by Aspik101. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_polish_aspik101_pipeline_pl_5.1.4_3.4_1697758839664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_polish_aspik101_pipeline_pl_5.1.4_3.4_1697758839664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_polish_aspik101_pipeline', lang = 'pl') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_polish_aspik101_pipeline', lang = 'pl') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_polish_aspik101_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|pl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Aspik101/whisper-small-pl + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_polish_aspik101_pl.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_polish_aspik101_pl.md new file mode 100644 index 00000000000000..1a7bbfa6309da5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_polish_aspik101_pl.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Polish asr_whisper_small_polish_aspik101 WhisperForCTC from Aspik101 +author: John Snow Labs +name: asr_whisper_small_polish_aspik101 +date: 2023-10-19 +tags: [whisper, pl, open_source, asr, onnx] +task: Automatic Speech Recognition +language: pl +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_polish_aspik101` is a Polish model originally trained by Aspik101. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_polish_aspik101_pl_5.1.4_3.4_1697758808695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_polish_aspik101_pl_5.1.4_3.4_1697758808695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_polish_aspik101","pl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_polish_aspik101","pl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_polish_aspik101| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Aspik101/whisper-small-pl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_se_afroanton_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_se_afroanton_en.md new file mode 100644 index 00000000000000..2b4f360e9f8d84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_se_afroanton_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_swedish_se_afroanton WhisperForCTC from afroanton +author: John Snow Labs +name: asr_whisper_small_swedish_se_afroanton +date: 2023-10-19 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swedish_se_afroanton` is a English model originally trained by afroanton. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_se_afroanton_en_5.1.4_3.4_1697758027499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_se_afroanton_en_5.1.4_3.4_1697758027499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_swedish_se_afroanton","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_swedish_se_afroanton","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swedish_se_afroanton| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/afroanton/whisper-small-sv-SE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_se_afroanton_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_se_afroanton_pipeline_en.md new file mode 100644 index 00000000000000..fd3d60a6523db4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_se_afroanton_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_swedish_se_afroanton_pipeline pipeline WhisperForCTC from afroanton +author: John Snow Labs +name: asr_whisper_small_swedish_se_afroanton_pipeline +date: 2023-10-19 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swedish_se_afroanton_pipeline` is a English model originally trained by afroanton. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_se_afroanton_pipeline_en_5.1.4_3.4_1697758053063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_se_afroanton_pipeline_en_5.1.4_3.4_1697758053063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_swedish_se_afroanton_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_swedish_se_afroanton_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swedish_se_afroanton_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/afroanton/whisper-small-sv-SE + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_test_3000_pipeline_sv.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_test_3000_pipeline_sv.md new file mode 100644 index 00000000000000..d6504ee55571ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_test_3000_pipeline_sv.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Swedish asr_whisper_small_swedish_test_3000_pipeline pipeline WhisperForCTC from ZinebSN +author: John Snow Labs +name: asr_whisper_small_swedish_test_3000_pipeline +date: 2023-10-19 +tags: [whisper, sv, open_source, pipeline] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swedish_test_3000_pipeline` is a Swedish model originally trained by ZinebSN. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_test_3000_pipeline_sv_5.1.4_3.4_1697754957297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_test_3000_pipeline_sv_5.1.4_3.4_1697754957297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_swedish_test_3000_pipeline', lang = 'sv') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_swedish_test_3000_pipeline', lang = 'sv') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swedish_test_3000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ZinebSN/whisper-small-swedish-Test-3000 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_test_3000_sv.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_test_3000_sv.md new file mode 100644 index 00000000000000..d0496b955ce397 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_swedish_test_3000_sv.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Swedish asr_whisper_small_swedish_test_3000 WhisperForCTC from ZinebSN +author: John Snow Labs +name: asr_whisper_small_swedish_test_3000 +date: 2023-10-19 +tags: [whisper, sv, open_source, asr, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swedish_test_3000` is a Swedish model originally trained by ZinebSN. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_test_3000_sv_5.1.4_3.4_1697754925389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_test_3000_sv_5.1.4_3.4_1697754925389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_swedish_test_3000","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_swedish_test_3000","sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swedish_test_3000| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ZinebSN/whisper-small-swedish-Test-3000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_en.md new file mode 100644 index 00000000000000..5fb84b16c4ff62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic WhisperForCTC from kpriyanshu256 +author: John Snow Labs +name: asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic +date: 2023-10-19 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic` is a English model originally trained by kpriyanshu256. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_en_5.1.4_3.4_1697754395649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_en_5.1.4_3.4_1697754395649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/kpriyanshu256/whisper-small-ur-1000-64-1e-05-pretrain-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline_en.md new file mode 100644 index 00000000000000..7bc5bc3d8f8794 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline pipeline WhisperForCTC from kpriyanshu256 +author: John Snow Labs +name: asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline +date: 2023-10-19 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline` is a English model originally trained by kpriyanshu256. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline_en_5.1.4_3.4_1697754437394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline_en_5.1.4_3.4_1697754437394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_urdu_1000_64_1e_05_pretrain_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/kpriyanshu256/whisper-small-ur-1000-64-1e-05-pretrain-ar + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_uzbek_pipeline_uz.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_uzbek_pipeline_uz.md new file mode 100644 index 00000000000000..cabf8598ecdb34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_uzbek_pipeline_uz.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Uzbek asr_whisper_small_uzbek_pipeline pipeline WhisperForCTC from BlueRaccoon +author: John Snow Labs +name: asr_whisper_small_uzbek_pipeline +date: 2023-10-19 +tags: [whisper, uz, open_source, pipeline] +task: Automatic Speech Recognition +language: uz +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_uzbek_pipeline` is a Uzbek model originally trained by BlueRaccoon. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_uzbek_pipeline_uz_5.1.4_3.4_1697758115932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_uzbek_pipeline_uz_5.1.4_3.4_1697758115932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_uzbek_pipeline', lang = 'uz') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_uzbek_pipeline', lang = 'uz') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_uzbek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|uz| +|Size:|1.7 GB| + +## References + +https://huggingface.co/BlueRaccoon/whisper-small-uz + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_uzbek_uz.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_uzbek_uz.md new file mode 100644 index 00000000000000..63146e9307e81d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_small_uzbek_uz.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Uzbek asr_whisper_small_uzbek WhisperForCTC from BlueRaccoon +author: John Snow Labs +name: asr_whisper_small_uzbek +date: 2023-10-19 +tags: [whisper, uz, open_source, asr, onnx] +task: Automatic Speech Recognition +language: uz +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_uzbek` is a Uzbek model originally trained by BlueRaccoon. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_uzbek_uz_5.1.4_3.4_1697758091527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_uzbek_uz_5.1.4_3.4_1697758091527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_uzbek","uz") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_uzbek","uz") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_uzbek| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|uz| +|Size:|1.7 GB| + +## References + +https://huggingface.co/BlueRaccoon/whisper-small-uz \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_polish_pipeline_pl.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_polish_pipeline_pl.md new file mode 100644 index 00000000000000..a82842463d19c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_polish_pipeline_pl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Polish asr_whisper_tiny_polish_pipeline pipeline WhisperForCTC from Aspik101 +author: John Snow Labs +name: asr_whisper_tiny_polish_pipeline +date: 2023-10-19 +tags: [whisper, pl, open_source, pipeline] +task: Automatic Speech Recognition +language: pl +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_polish_pipeline` is a Polish model originally trained by Aspik101. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_polish_pipeline_pl_5.1.4_3.4_1697759156199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_polish_pipeline_pl_5.1.4_3.4_1697759156199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_tiny_polish_pipeline', lang = 'pl') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_tiny_polish_pipeline', lang = 'pl') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_polish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|pl| +|Size:|390.7 MB| + +## References + +https://huggingface.co/Aspik101/whisper-tiny-pl + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_polish_pl.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_polish_pl.md new file mode 100644 index 00000000000000..20eb9a7e90fa34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_polish_pl.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Polish asr_whisper_tiny_polish WhisperForCTC from Aspik101 +author: John Snow Labs +name: asr_whisper_tiny_polish +date: 2023-10-19 +tags: [whisper, pl, open_source, asr, onnx] +task: Automatic Speech Recognition +language: pl +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_polish` is a Polish model originally trained by Aspik101. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_polish_pl_5.1.4_3.4_1697759148948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_polish_pl_5.1.4_3.4_1697759148948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_polish","pl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_polish","pl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_polish| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pl| +|Size:|390.7 MB| + +## References + +https://huggingface.co/Aspik101/whisper-tiny-pl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_tamil_example_pipeline_ta.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_tamil_example_pipeline_ta.md new file mode 100644 index 00000000000000..24f1515ee7e316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_tamil_example_pipeline_ta.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Tamil asr_whisper_tiny_tamil_example_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: asr_whisper_tiny_tamil_example_pipeline +date: 2023-10-19 +tags: [whisper, ta, open_source, pipeline] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_tamil_example_pipeline` is a Tamil model originally trained by parambharat. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_tamil_example_pipeline_ta_5.1.4_3.4_1697754897889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_tamil_example_pipeline_ta_5.1.4_3.4_1697754897889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_tiny_tamil_example_pipeline', lang = 'ta') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_tiny_tamil_example_pipeline', lang = 'ta') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_tamil_example_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|ta| +|Size:|390.9 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-ta-example + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_tamil_example_ta.md b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_tamil_example_ta.md new file mode 100644 index 00000000000000..40505bb5ea47de --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-19-asr_whisper_tiny_tamil_example_ta.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Tamil asr_whisper_tiny_tamil_example WhisperForCTC from parambharat +author: John Snow Labs +name: asr_whisper_tiny_tamil_example +date: 2023-10-19 +tags: [whisper, ta, open_source, asr, onnx] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_tamil_example` is a Tamil model originally trained by parambharat. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_tamil_example_ta_5.1.4_3.4_1697754888104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_tamil_example_ta_5.1.4_3.4_1697754888104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_tamil_example","ta") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_tamil_example","ta") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_tamil_example| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ta| +|Size:|390.9 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-ta-example \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_base_swedish_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_base_swedish_en.md new file mode 100644 index 00000000000000..e5374a6b02df8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_base_swedish_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_base_swedish WhisperForCTC from rscolati +author: John Snow Labs +name: asr_whisper_base_swedish +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_base_swedish` is a English model originally trained by rscolati. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_base_swedish_en_5.1.4_3.4_1697761530317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_base_swedish_en_5.1.4_3.4_1697761530317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_base_swedish","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_base_swedish","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_base_swedish| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.4 MB| + +## References + +https://huggingface.co/rscolati/whisper-base-sv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_base_swedish_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_base_swedish_pipeline_en.md new file mode 100644 index 00000000000000..6698bf87195e1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_base_swedish_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_base_swedish_pipeline pipeline WhisperForCTC from rscolati +author: John Snow Labs +name: asr_whisper_base_swedish_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_base_swedish_pipeline` is a English model originally trained by rscolati. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_base_swedish_pipeline_en_5.1.4_3.4_1697761542899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_base_swedish_pipeline_en_5.1.4_3.4_1697761542899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_base_swedish_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_base_swedish_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_base_swedish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.4 MB| + +## References + +https://huggingface.co/rscolati/whisper-base-sv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_danish_small_augmented_da.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_danish_small_augmented_da.md new file mode 100644 index 00000000000000..738c57acff5b1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_danish_small_augmented_da.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Danish asr_whisper_danish_small_augmented WhisperForCTC from ALM +author: John Snow Labs +name: asr_whisper_danish_small_augmented +date: 2023-10-20 +tags: [whisper, da, open_source, asr, onnx] +task: Automatic Speech Recognition +language: da +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_danish_small_augmented` is a Danish model originally trained by ALM. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_danish_small_augmented_da_5.1.4_3.4_1697768052248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_danish_small_augmented_da_5.1.4_3.4_1697768052248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_danish_small_augmented","da") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_danish_small_augmented","da") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_danish_small_augmented| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|da| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ALM/whisper-da-small-augmented \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_danish_small_augmented_pipeline_da.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_danish_small_augmented_pipeline_da.md new file mode 100644 index 00000000000000..80706f147bbcd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_danish_small_augmented_pipeline_da.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Danish asr_whisper_danish_small_augmented_pipeline pipeline WhisperForCTC from ALM +author: John Snow Labs +name: asr_whisper_danish_small_augmented_pipeline +date: 2023-10-20 +tags: [whisper, da, open_source, pipeline] +task: Automatic Speech Recognition +language: da +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_danish_small_augmented_pipeline` is a Danish model originally trained by ALM. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_danish_small_augmented_pipeline_da_5.1.4_3.4_1697768078968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_danish_small_augmented_pipeline_da_5.1.4_3.4_1697768078968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_danish_small_augmented_pipeline', lang = 'da') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_danish_small_augmented_pipeline', lang = 'da') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_danish_small_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ALM/whisper-da-small-augmented + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_arabic_cv11_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_arabic_cv11_en.md new file mode 100644 index 00000000000000..97d1cee82f35fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_arabic_cv11_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_arabic_cv11 WhisperForCTC from hkhdair +author: John Snow Labs +name: asr_whisper_small_arabic_cv11 +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_arabic_cv11` is a English model originally trained by hkhdair. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_arabic_cv11_en_5.1.4_3.4_1697764506687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_arabic_cv11_en_5.1.4_3.4_1697764506687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_arabic_cv11","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_arabic_cv11","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_arabic_cv11| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hkhdair/whisper-small-ar-cv11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_arabic_cv11_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_arabic_cv11_pipeline_en.md new file mode 100644 index 00000000000000..24ae96a1db563c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_arabic_cv11_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_arabic_cv11_pipeline pipeline WhisperForCTC from hkhdair +author: John Snow Labs +name: asr_whisper_small_arabic_cv11_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_arabic_cv11_pipeline` is a English model originally trained by hkhdair. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_arabic_cv11_pipeline_en_5.1.4_3.4_1697764533389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_arabic_cv11_pipeline_en_5.1.4_3.4_1697764533389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_arabic_cv11_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_arabic_cv11_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_arabic_cv11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hkhdair/whisper-small-ar-cv11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_armenian_hy.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_armenian_hy.md new file mode 100644 index 00000000000000..64ca52425b8f33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_armenian_hy.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Armenian asr_whisper_small_armenian WhisperForCTC from pranay-j +author: John Snow Labs +name: asr_whisper_small_armenian +date: 2023-10-20 +tags: [whisper, hy, open_source, asr, onnx] +task: Automatic Speech Recognition +language: hy +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_armenian` is a Armenian model originally trained by pranay-j. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_armenian_hy_5.1.4_3.4_1697760551180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_armenian_hy_5.1.4_3.4_1697760551180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_armenian","hy") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_armenian","hy") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_armenian| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hy| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pranay-j/whisper-small-hy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_armenian_pipeline_hy.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_armenian_pipeline_hy.md new file mode 100644 index 00000000000000..61ccef878b4b7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_armenian_pipeline_hy.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Armenian asr_whisper_small_armenian_pipeline pipeline WhisperForCTC from pranay-j +author: John Snow Labs +name: asr_whisper_small_armenian_pipeline +date: 2023-10-20 +tags: [whisper, hy, open_source, pipeline] +task: Automatic Speech Recognition +language: hy +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_armenian_pipeline` is a Armenian model originally trained by pranay-j. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_armenian_pipeline_hy_5.1.4_3.4_1697760577067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_armenian_pipeline_hy_5.1.4_3.4_1697760577067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_armenian_pipeline', lang = 'hy') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_armenian_pipeline', lang = 'hy') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_armenian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|hy| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pranay-j/whisper-small-hy + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_dutch_nl.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_dutch_nl.md new file mode 100644 index 00000000000000..a92fa297b5c127 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_dutch_nl.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Dutch, Flemish asr_whisper_small_dutch WhisperForCTC from pplantinga +author: John Snow Labs +name: asr_whisper_small_dutch +date: 2023-10-20 +tags: [whisper, nl, open_source, asr, onnx] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_dutch` is a Dutch, Flemish model originally trained by pplantinga. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_dutch_nl_5.1.4_3.4_1697760208052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_dutch_nl_5.1.4_3.4_1697760208052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_dutch","nl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_dutch","nl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_dutch| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pplantinga/whisper-small-nl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_dutch_pipeline_nl.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_dutch_pipeline_nl.md new file mode 100644 index 00000000000000..40d28e76f0830d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_dutch_pipeline_nl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Dutch, Flemish asr_whisper_small_dutch_pipeline pipeline WhisperForCTC from pplantinga +author: John Snow Labs +name: asr_whisper_small_dutch_pipeline +date: 2023-10-20 +tags: [whisper, nl, open_source, pipeline] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_dutch_pipeline` is a Dutch, Flemish model originally trained by pplantinga. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_dutch_pipeline_nl_5.1.4_3.4_1697760252058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_dutch_pipeline_nl_5.1.4_3.4_1697760252058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_dutch_pipeline', lang = 'nl') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_dutch_pipeline', lang = 'nl') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_dutch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pplantinga/whisper-small-nl + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_english_blueraccoon_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_english_blueraccoon_en.md new file mode 100644 index 00000000000000..7dcaca2456739f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_english_blueraccoon_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_english_blueraccoon WhisperForCTC from BlueRaccoon +author: John Snow Labs +name: asr_whisper_small_english_blueraccoon +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_english_blueraccoon` is a English model originally trained by BlueRaccoon. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_english_blueraccoon_en_5.1.4_3.4_1697760099807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_english_blueraccoon_en_5.1.4_3.4_1697760099807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_english_blueraccoon","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_english_blueraccoon","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_english_blueraccoon| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/BlueRaccoon/whisper-small-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_english_blueraccoon_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_english_blueraccoon_pipeline_en.md new file mode 100644 index 00000000000000..7011d52e86a2fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_english_blueraccoon_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_english_blueraccoon_pipeline pipeline WhisperForCTC from BlueRaccoon +author: John Snow Labs +name: asr_whisper_small_english_blueraccoon_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_english_blueraccoon_pipeline` is a English model originally trained by BlueRaccoon. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_english_blueraccoon_pipeline_en_5.1.4_3.4_1697760125385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_english_blueraccoon_pipeline_en_5.1.4_3.4_1697760125385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_english_blueraccoon_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_english_blueraccoon_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_english_blueraccoon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/BlueRaccoon/whisper-small-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_finnish_sgangireddy_fi.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_finnish_sgangireddy_fi.md new file mode 100644 index 00000000000000..1e4566602d64ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_finnish_sgangireddy_fi.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Finnish asr_whisper_small_finnish_sgangireddy WhisperForCTC from sgangireddy +author: John Snow Labs +name: asr_whisper_small_finnish_sgangireddy +date: 2023-10-20 +tags: [whisper, fi, open_source, asr, onnx] +task: Automatic Speech Recognition +language: fi +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_finnish_sgangireddy` is a Finnish model originally trained by sgangireddy. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_finnish_sgangireddy_fi_5.1.4_3.4_1697760961669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_finnish_sgangireddy_fi_5.1.4_3.4_1697760961669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_finnish_sgangireddy","fi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_finnish_sgangireddy","fi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_finnish_sgangireddy| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sgangireddy/whisper-small-fi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_finnish_sgangireddy_pipeline_fi.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_finnish_sgangireddy_pipeline_fi.md new file mode 100644 index 00000000000000..88e37bc7b7970a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_finnish_sgangireddy_pipeline_fi.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Finnish asr_whisper_small_finnish_sgangireddy_pipeline pipeline WhisperForCTC from sgangireddy +author: John Snow Labs +name: asr_whisper_small_finnish_sgangireddy_pipeline +date: 2023-10-20 +tags: [whisper, fi, open_source, pipeline] +task: Automatic Speech Recognition +language: fi +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_finnish_sgangireddy_pipeline` is a Finnish model originally trained by sgangireddy. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_finnish_sgangireddy_pipeline_fi_5.1.4_3.4_1697760986850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_finnish_sgangireddy_pipeline_fi_5.1.4_3.4_1697760986850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_finnish_sgangireddy_pipeline', lang = 'fi') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_finnish_sgangireddy_pipeline', lang = 'fi') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_finnish_sgangireddy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|fi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sgangireddy/whisper-small-fi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_french_yocel1_hi.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_french_yocel1_hi.md new file mode 100644 index 00000000000000..480878cee41c94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_french_yocel1_hi.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Hindi asr_whisper_small_french_yocel1 WhisperForCTC from Yocel1 +author: John Snow Labs +name: asr_whisper_small_french_yocel1 +date: 2023-10-20 +tags: [whisper, hi, open_source, asr, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_french_yocel1` is a Hindi model originally trained by Yocel1. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_french_yocel1_hi_5.1.4_3.4_1697762514859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_french_yocel1_hi_5.1.4_3.4_1697762514859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_french_yocel1","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_french_yocel1","hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_french_yocel1| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yocel1/whisper-small-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_french_yocel1_pipeline_hi.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_french_yocel1_pipeline_hi.md new file mode 100644 index 00000000000000..33958cc8436431 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_french_yocel1_pipeline_hi.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Hindi asr_whisper_small_french_yocel1_pipeline pipeline WhisperForCTC from Yocel1 +author: John Snow Labs +name: asr_whisper_small_french_yocel1_pipeline +date: 2023-10-20 +tags: [whisper, hi, open_source, pipeline] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_french_yocel1_pipeline` is a Hindi model originally trained by Yocel1. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_french_yocel1_pipeline_hi_5.1.4_3.4_1697762545502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_french_yocel1_pipeline_hi_5.1.4_3.4_1697762545502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_french_yocel1_pipeline', lang = 'hi') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_french_yocel1_pipeline', lang = 'hi') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_french_yocel1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yocel1/whisper-small-fr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_hungarian_cv11_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_hungarian_cv11_en.md new file mode 100644 index 00000000000000..5447154ab9eb55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_hungarian_cv11_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_hungarian_cv11 WhisperForCTC from mikr +author: John Snow Labs +name: asr_whisper_small_hungarian_cv11 +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_hungarian_cv11` is a English model originally trained by mikr. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_hungarian_cv11_en_5.1.4_3.4_1697762655252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_hungarian_cv11_en_5.1.4_3.4_1697762655252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_hungarian_cv11","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_hungarian_cv11","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_hungarian_cv11| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mikr/whisper-small-hu-cv11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_hungarian_cv11_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_hungarian_cv11_pipeline_en.md new file mode 100644 index 00000000000000..5168ebf5cbf736 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_hungarian_cv11_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_hungarian_cv11_pipeline pipeline WhisperForCTC from mikr +author: John Snow Labs +name: asr_whisper_small_hungarian_cv11_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_hungarian_cv11_pipeline` is a English model originally trained by mikr. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_hungarian_cv11_pipeline_en_5.1.4_3.4_1697762679177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_hungarian_cv11_pipeline_en_5.1.4_3.4_1697762679177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_hungarian_cv11_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_hungarian_cv11_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_hungarian_cv11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mikr/whisper-small-hu-cv11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_japanese_vumichien_ja.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_japanese_vumichien_ja.md new file mode 100644 index 00000000000000..2fd53e69bf0775 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_japanese_vumichien_ja.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Japanese asr_whisper_small_japanese_vumichien WhisperForCTC from vumichien +author: John Snow Labs +name: asr_whisper_small_japanese_vumichien +date: 2023-10-20 +tags: [whisper, ja, open_source, asr, onnx] +task: Automatic Speech Recognition +language: ja +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_japanese_vumichien` is a Japanese model originally trained by vumichien. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_japanese_vumichien_ja_5.1.4_3.4_1697766130278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_japanese_vumichien_ja_5.1.4_3.4_1697766130278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_japanese_vumichien","ja") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_japanese_vumichien","ja") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_japanese_vumichien| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ja| +|Size:|1.7 GB| + +## References + +https://huggingface.co/vumichien/whisper-small-ja \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_japanese_vumichien_pipeline_ja.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_japanese_vumichien_pipeline_ja.md new file mode 100644 index 00000000000000..9292bb7027ebac --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_japanese_vumichien_pipeline_ja.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Japanese asr_whisper_small_japanese_vumichien_pipeline pipeline WhisperForCTC from vumichien +author: John Snow Labs +name: asr_whisper_small_japanese_vumichien_pipeline +date: 2023-10-20 +tags: [whisper, ja, open_source, pipeline] +task: Automatic Speech Recognition +language: ja +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_japanese_vumichien_pipeline` is a Japanese model originally trained by vumichien. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_japanese_vumichien_pipeline_ja_5.1.4_3.4_1697766170978.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_japanese_vumichien_pipeline_ja_5.1.4_3.4_1697766170978.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_japanese_vumichien_pipeline', lang = 'ja') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_japanese_vumichien_pipeline', lang = 'ja') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_japanese_vumichien_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|1.7 GB| + +## References + +https://huggingface.co/vumichien/whisper-small-ja + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_korean_fl_ko.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_korean_fl_ko.md new file mode 100644 index 00000000000000..cbaf1e05b2e1d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_korean_fl_ko.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Korean asr_whisper_small_korean_fl WhisperForCTC from p4b +author: John Snow Labs +name: asr_whisper_small_korean_fl +date: 2023-10-20 +tags: [whisper, ko, open_source, asr, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_korean_fl` is a Korean model originally trained by p4b. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_korean_fl_ko_5.1.4_3.4_1697768363999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_korean_fl_ko_5.1.4_3.4_1697768363999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_korean_fl","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_korean_fl","ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_korean_fl| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.1 GB| + +## References + +https://huggingface.co/p4b/whisper-small-ko-fl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_korean_fl_pipeline_ko.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_korean_fl_pipeline_ko.md new file mode 100644 index 00000000000000..3dbfd83f0d2540 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_korean_fl_pipeline_ko.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Korean asr_whisper_small_korean_fl_pipeline pipeline WhisperForCTC from p4b +author: John Snow Labs +name: asr_whisper_small_korean_fl_pipeline +date: 2023-10-20 +tags: [whisper, ko, open_source, pipeline] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_korean_fl_pipeline` is a Korean model originally trained by p4b. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_korean_fl_pipeline_ko_5.1.4_3.4_1697768403117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_korean_fl_pipeline_ko_5.1.4_3.4_1697768403117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_korean_fl_pipeline', lang = 'ko') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_korean_fl_pipeline', lang = 'ko') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_korean_fl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|1.1 GB| + +## References + +https://huggingface.co/p4b/whisper-small-ko-fl + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_lithuanian_serbian_v2_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_lithuanian_serbian_v2_en.md new file mode 100644 index 00000000000000..e9068f9f5feae0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_lithuanian_serbian_v2_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_lithuanian_serbian_v2 WhisperForCTC from jraramhoej +author: John Snow Labs +name: asr_whisper_small_lithuanian_serbian_v2 +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_lithuanian_serbian_v2` is a English model originally trained by jraramhoej. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_lithuanian_serbian_v2_en_5.1.4_3.4_1697761536370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_lithuanian_serbian_v2_en_5.1.4_3.4_1697761536370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_lithuanian_serbian_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_lithuanian_serbian_v2","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_lithuanian_serbian_v2| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jraramhoej/whisper-small-lt-sr-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_lithuanian_serbian_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_lithuanian_serbian_v2_pipeline_en.md new file mode 100644 index 00000000000000..8e4bb662490fa7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_lithuanian_serbian_v2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_lithuanian_serbian_v2_pipeline pipeline WhisperForCTC from jraramhoej +author: John Snow Labs +name: asr_whisper_small_lithuanian_serbian_v2_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_lithuanian_serbian_v2_pipeline` is a English model originally trained by jraramhoej. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_lithuanian_serbian_v2_pipeline_en_5.1.4_3.4_1697761572103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_lithuanian_serbian_v2_pipeline_en_5.1.4_3.4_1697761572103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_lithuanian_serbian_v2_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_lithuanian_serbian_v2_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_lithuanian_serbian_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jraramhoej/whisper-small-lt-sr-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_mongolian_3_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_mongolian_3_en.md new file mode 100644 index 00000000000000..5eac73456783ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_mongolian_3_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_mongolian_3 WhisperForCTC from bayartsogt +author: John Snow Labs +name: asr_whisper_small_mongolian_3 +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_mongolian_3` is a English model originally trained by bayartsogt. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_mongolian_3_en_5.1.4_3.4_1697766978958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_mongolian_3_en_5.1.4_3.4_1697766978958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_mongolian_3","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_mongolian_3","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_mongolian_3| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bayartsogt/whisper-small-mn-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_mongolian_3_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_mongolian_3_pipeline_en.md new file mode 100644 index 00000000000000..bddb3b05420fb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_mongolian_3_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_mongolian_3_pipeline pipeline WhisperForCTC from bayartsogt +author: John Snow Labs +name: asr_whisper_small_mongolian_3_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_mongolian_3_pipeline` is a English model originally trained by bayartsogt. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_mongolian_3_pipeline_en_5.1.4_3.4_1697767005814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_mongolian_3_pipeline_en_5.1.4_3.4_1697767005814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_mongolian_3_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_mongolian_3_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_mongolian_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bayartsogt/whisper-small-mn-3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_nob_no.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_nob_no.md new file mode 100644 index 00000000000000..6d0612f568f918 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_nob_no.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Norwegian asr_whisper_small_nob WhisperForCTC from NbAiLab +author: John Snow Labs +name: asr_whisper_small_nob +date: 2023-10-20 +tags: [whisper, "no", open_source, asr, onnx] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_nob` is a Norwegian model originally trained by NbAiLab. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_nob_no_5.1.4_3.4_1697767866471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_nob_no_5.1.4_3.4_1697767866471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_nob","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_nob","no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_nob| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|1.1 GB| + +## References + +https://huggingface.co/NbAiLab/whisper-small-nob \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_nob_pipeline_no.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_nob_pipeline_no.md new file mode 100644 index 00000000000000..1e5e58697367e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_nob_pipeline_no.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Norwegian asr_whisper_small_nob_pipeline pipeline WhisperForCTC from NbAiLab +author: John Snow Labs +name: asr_whisper_small_nob_pipeline +date: 2023-10-20 +tags: [whisper, "no", open_source, pipeline] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_nob_pipeline` is a Norwegian model originally trained by NbAiLab. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_nob_pipeline_no_5.1.4_3.4_1697767890596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_nob_pipeline_no_5.1.4_3.4_1697767890596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_nob_pipeline', lang = 'no') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_nob_pipeline', lang = 'no') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_nob_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|no| +|Size:|1.1 GB| + +## References + +https://huggingface.co/NbAiLab/whisper-small-nob + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_pashto_ihanif_pipeline_ps.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_pashto_ihanif_pipeline_ps.md new file mode 100644 index 00000000000000..35b47d9c30cf50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_pashto_ihanif_pipeline_ps.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Pashto, Pushto asr_whisper_small_pashto_ihanif_pipeline pipeline WhisperForCTC from ihanif +author: John Snow Labs +name: asr_whisper_small_pashto_ihanif_pipeline +date: 2023-10-20 +tags: [whisper, ps, open_source, pipeline] +task: Automatic Speech Recognition +language: ps +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_pashto_ihanif_pipeline` is a Pashto, Pushto model originally trained by ihanif. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_pashto_ihanif_pipeline_ps_5.1.4_3.4_1697764646764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_pashto_ihanif_pipeline_ps_5.1.4_3.4_1697764646764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_pashto_ihanif_pipeline', lang = 'ps') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_pashto_ihanif_pipeline', lang = 'ps') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_pashto_ihanif_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|ps| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ihanif/whisper-small-pashto + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_pashto_ihanif_ps.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_pashto_ihanif_ps.md new file mode 100644 index 00000000000000..c88d37d77d2e84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_pashto_ihanif_ps.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Pashto, Pushto asr_whisper_small_pashto_ihanif WhisperForCTC from ihanif +author: John Snow Labs +name: asr_whisper_small_pashto_ihanif +date: 2023-10-20 +tags: [whisper, ps, open_source, asr, onnx] +task: Automatic Speech Recognition +language: ps +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_pashto_ihanif` is a Pashto, Pushto model originally trained by ihanif. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_pashto_ihanif_ps_5.1.4_3.4_1697764620611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_pashto_ihanif_ps_5.1.4_3.4_1697764620611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_pashto_ihanif","ps") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_pashto_ihanif","ps") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_pashto_ihanif| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ps| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ihanif/whisper-small-pashto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_punjabi_eastern_pa.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_punjabi_eastern_pa.md new file mode 100644 index 00000000000000..a1ec98043e0395 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_punjabi_eastern_pa.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Panjabi, Punjabi asr_whisper_small_punjabi_eastern WhisperForCTC from anuragshas +author: John Snow Labs +name: asr_whisper_small_punjabi_eastern +date: 2023-10-20 +tags: [whisper, pa, open_source, asr, onnx] +task: Automatic Speech Recognition +language: pa +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_punjabi_eastern` is a Panjabi, Punjabi model originally trained by anuragshas. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_punjabi_eastern_pa_5.1.4_3.4_1697796901247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_punjabi_eastern_pa_5.1.4_3.4_1697796901247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_punjabi_eastern","pa") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_punjabi_eastern","pa") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_punjabi_eastern| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pa| +|Size:|1.7 GB| + +## References + +https://huggingface.co/anuragshas/whisper-small-pa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_punjabi_eastern_pipeline_pa.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_punjabi_eastern_pipeline_pa.md new file mode 100644 index 00000000000000..5a99a1a34d1836 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_punjabi_eastern_pipeline_pa.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Panjabi, Punjabi asr_whisper_small_punjabi_eastern_pipeline pipeline WhisperForCTC from anuragshas +author: John Snow Labs +name: asr_whisper_small_punjabi_eastern_pipeline +date: 2023-10-20 +tags: [whisper, pa, open_source, pipeline] +task: Automatic Speech Recognition +language: pa +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_punjabi_eastern_pipeline` is a Panjabi, Punjabi model originally trained by anuragshas. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_punjabi_eastern_pipeline_pa_5.1.4_3.4_1697796972727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_punjabi_eastern_pipeline_pa_5.1.4_3.4_1697796972727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_punjabi_eastern_pipeline', lang = 'pa') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_punjabi_eastern_pipeline', lang = 'pa') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_punjabi_eastern_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|pa| +|Size:|1.7 GB| + +## References + +https://huggingface.co/anuragshas/whisper-small-pa + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_1e_6_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_1e_6_en.md new file mode 100644 index 00000000000000..7d4c2498c493c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_1e_6_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_spanish_1e_6 WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: asr_whisper_small_spanish_1e_6 +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_spanish_1e_6` is a English model originally trained by sanchit-gandhi. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_spanish_1e_6_en_5.1.4_3.4_1697760169069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_spanish_1e_6_en_5.1.4_3.4_1697760169069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_spanish_1e_6","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_spanish_1e_6","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_spanish_1e_6| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanchit-gandhi/whisper-small-es-1e-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_1e_6_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_1e_6_pipeline_en.md new file mode 100644 index 00000000000000..12f2593de97f96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_1e_6_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_spanish_1e_6_pipeline pipeline WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: asr_whisper_small_spanish_1e_6_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_spanish_1e_6_pipeline` is a English model originally trained by sanchit-gandhi. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_spanish_1e_6_pipeline_en_5.1.4_3.4_1697760208725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_spanish_1e_6_pipeline_en_5.1.4_3.4_1697760208725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_spanish_1e_6_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_spanish_1e_6_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_spanish_1e_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanchit-gandhi/whisper-small-es-1e-6 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_ari_es.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_ari_es.md new file mode 100644 index 00000000000000..11327d1825fa85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_ari_es.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Castilian, Spanish asr_whisper_small_spanish_ari WhisperForCTC from Ari +author: John Snow Labs +name: asr_whisper_small_spanish_ari +date: 2023-10-20 +tags: [whisper, es, open_source, asr, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_spanish_ari` is a Castilian, Spanish model originally trained by Ari. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_spanish_ari_es_5.1.4_3.4_1697768478107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_spanish_ari_es_5.1.4_3.4_1697768478107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_spanish_ari","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_spanish_ari","es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_spanish_ari| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Ari/whisper-small-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_ari_pipeline_es.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_ari_pipeline_es.md new file mode 100644 index 00000000000000..4da02633af751f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_spanish_ari_pipeline_es.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Castilian, Spanish asr_whisper_small_spanish_ari_pipeline pipeline WhisperForCTC from Ari +author: John Snow Labs +name: asr_whisper_small_spanish_ari_pipeline +date: 2023-10-20 +tags: [whisper, es, open_source, pipeline] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_spanish_ari_pipeline` is a Castilian, Spanish model originally trained by Ari. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_spanish_ari_pipeline_es_5.1.4_3.4_1697768503168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_spanish_ari_pipeline_es_5.1.4_3.4_1697768503168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_spanish_ari_pipeline', lang = 'es') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_spanish_ari_pipeline', lang = 'es') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_spanish_ari_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Ari/whisper-small-es + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swe2_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swe2_en.md new file mode 100644 index 00000000000000..6f9044449d9568 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swe2_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_small_swe2 WhisperForCTC from Alexao +author: John Snow Labs +name: asr_whisper_small_swe2 +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swe2` is a English model originally trained by Alexao. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swe2_en_5.1.4_3.4_1697767054560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swe2_en_5.1.4_3.4_1697767054560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_swe2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_swe2","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swe2| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/Alexao/whisper-small-swe2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swe2_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swe2_pipeline_en.md new file mode 100644 index 00000000000000..298294a1564b3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swe2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_small_swe2_pipeline pipeline WhisperForCTC from Alexao +author: John Snow Labs +name: asr_whisper_small_swe2_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swe2_pipeline` is a English model originally trained by Alexao. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swe2_pipeline_en_5.1.4_3.4_1697767075304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swe2_pipeline_en_5.1.4_3.4_1697767075304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_swe2_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_swe2_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swe2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/Alexao/whisper-small-swe2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_english_pipeline_se.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_english_pipeline_se.md new file mode 100644 index 00000000000000..f9feb96ec89d7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_english_pipeline_se.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Northern Sami asr_whisper_small_swedish_english_pipeline pipeline WhisperForCTC from humeur +author: John Snow Labs +name: asr_whisper_small_swedish_english_pipeline +date: 2023-10-20 +tags: [whisper, se, open_source, pipeline] +task: Automatic Speech Recognition +language: se +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swedish_english_pipeline` is a Northern Sami model originally trained by humeur. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_english_pipeline_se_5.1.4_3.4_1697765615738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_english_pipeline_se_5.1.4_3.4_1697765615738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_swedish_english_pipeline', lang = 'se') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_swedish_english_pipeline', lang = 'se') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swedish_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|se| +|Size:|1.7 GB| + +## References + +https://huggingface.co/humeur/whisper-small-sv-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_english_se.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_english_se.md new file mode 100644 index 00000000000000..85ca6259dcf620 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_english_se.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Northern Sami asr_whisper_small_swedish_english WhisperForCTC from humeur +author: John Snow Labs +name: asr_whisper_small_swedish_english +date: 2023-10-20 +tags: [whisper, se, open_source, asr, onnx] +task: Automatic Speech Recognition +language: se +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swedish_english` is a Northern Sami model originally trained by humeur. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_english_se_5.1.4_3.4_1697765589681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_english_se_5.1.4_3.4_1697765589681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_swedish_english","se") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_swedish_english","se") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swedish_english| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|se| +|Size:|1.7 GB| + +## References + +https://huggingface.co/humeur/whisper-small-sv-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_torileatherman_pipeline_sv.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_torileatherman_pipeline_sv.md new file mode 100644 index 00000000000000..4689542537e5c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_torileatherman_pipeline_sv.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Swedish asr_whisper_small_swedish_torileatherman_pipeline pipeline WhisperForCTC from torileatherman +author: John Snow Labs +name: asr_whisper_small_swedish_torileatherman_pipeline +date: 2023-10-20 +tags: [whisper, sv, open_source, pipeline] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swedish_torileatherman_pipeline` is a Swedish model originally trained by torileatherman. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_torileatherman_pipeline_sv_5.1.4_3.4_1697762905907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_torileatherman_pipeline_sv_5.1.4_3.4_1697762905907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_small_swedish_torileatherman_pipeline', lang = 'sv') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_small_swedish_torileatherman_pipeline', lang = 'sv') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swedish_torileatherman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/torileatherman/whisper_small_sv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_torileatherman_sv.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_torileatherman_sv.md new file mode 100644 index 00000000000000..cc86af5841f60b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_small_swedish_torileatherman_sv.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Swedish asr_whisper_small_swedish_torileatherman WhisperForCTC from torileatherman +author: John Snow Labs +name: asr_whisper_small_swedish_torileatherman +date: 2023-10-20 +tags: [whisper, sv, open_source, asr, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_small_swedish_torileatherman` is a Swedish model originally trained by torileatherman. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_torileatherman_sv_5.1.4_3.4_1697762874959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_small_swedish_torileatherman_sv_5.1.4_3.4_1697762874959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_small_swedish_torileatherman","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_small_swedish_torileatherman","sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_small_swedish_torileatherman| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/torileatherman/whisper_small_sv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_testrun1_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_testrun1_en.md new file mode 100644 index 00000000000000..e361db61cd8204 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_testrun1_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_testrun1 WhisperForCTC from pere +author: John Snow Labs +name: asr_whisper_testrun1 +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_testrun1` is a English model originally trained by pere. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_testrun1_en_5.1.4_3.4_1697768297225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_testrun1_en_5.1.4_3.4_1697768297225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_testrun1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_testrun1","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_testrun1| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pere/whisper-testrun1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_testrun1_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_testrun1_pipeline_en.md new file mode 100644 index 00000000000000..3cfac75c55b379 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_testrun1_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_testrun1_pipeline pipeline WhisperForCTC from pere +author: John Snow Labs +name: asr_whisper_testrun1_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_testrun1_pipeline` is a English model originally trained by pere. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_testrun1_pipeline_en_5.1.4_3.4_1697768349850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_testrun1_pipeline_en_5.1.4_3.4_1697768349850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_testrun1_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_testrun1_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_testrun1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pere/whisper-testrun1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_2_it.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_2_it.md new file mode 100644 index 00000000000000..175a97d994bb03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_2_it.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Italian asr_whisper_tiny_italian_2 WhisperForCTC from GIanlucaRub +author: John Snow Labs +name: asr_whisper_tiny_italian_2 +date: 2023-10-20 +tags: [whisper, it, open_source, asr, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_italian_2` is a Italian model originally trained by GIanlucaRub. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_italian_2_it_5.1.4_3.4_1697767694767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_italian_2_it_5.1.4_3.4_1697767694767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_italian_2","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_italian_2","it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_italian_2| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|390.8 MB| + +## References + +https://huggingface.co/GIanlucaRub/whisper-tiny-it-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_2_pipeline_it.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_2_pipeline_it.md new file mode 100644 index 00000000000000..f04f7ee3de55b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_2_pipeline_it.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Italian asr_whisper_tiny_italian_2_pipeline pipeline WhisperForCTC from GIanlucaRub +author: John Snow Labs +name: asr_whisper_tiny_italian_2_pipeline +date: 2023-10-20 +tags: [whisper, it, open_source, pipeline] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_italian_2_pipeline` is a Italian model originally trained by GIanlucaRub. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_italian_2_pipeline_it_5.1.4_3.4_1697767702468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_italian_2_pipeline_it_5.1.4_3.4_1697767702468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_tiny_italian_2_pipeline', lang = 'it') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_tiny_italian_2_pipeline', lang = 'it') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_italian_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|390.8 MB| + +## References + +https://huggingface.co/GIanlucaRub/whisper-tiny-it-2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_local_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_local_en.md new file mode 100644 index 00000000000000..796e6b46ed44cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_local_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English asr_whisper_tiny_italian_local WhisperForCTC from GIanlucaRub +author: John Snow Labs +name: asr_whisper_tiny_italian_local +date: 2023-10-20 +tags: [whisper, en, open_source, asr, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_italian_local` is a English model originally trained by GIanlucaRub. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_italian_local_en_5.1.4_3.4_1697763349106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_italian_local_en_5.1.4_3.4_1697763349106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_italian_local","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_italian_local","en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_italian_local| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.7 MB| + +## References + +https://huggingface.co/GIanlucaRub/whisper-tiny-it-local \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_local_pipeline_en.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_local_pipeline_en.md new file mode 100644 index 00000000000000..2e056b75cb4d5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_italian_local_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English asr_whisper_tiny_italian_local_pipeline pipeline WhisperForCTC from GIanlucaRub +author: John Snow Labs +name: asr_whisper_tiny_italian_local_pipeline +date: 2023-10-20 +tags: [whisper, en, open_source, pipeline] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_italian_local_pipeline` is a English model originally trained by GIanlucaRub. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_italian_local_pipeline_en_5.1.4_3.4_1697763359286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_italian_local_pipeline_en_5.1.4_3.4_1697763359286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_tiny_italian_local_pipeline', lang = 'en') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_tiny_italian_local_pipeline', lang = 'en') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_italian_local_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.7 MB| + +## References + +https://huggingface.co/GIanlucaRub/whisper-tiny-it-local + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_spanish_arpagon_es.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_spanish_arpagon_es.md new file mode 100644 index 00000000000000..798ecbeeca1171 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_spanish_arpagon_es.md @@ -0,0 +1,92 @@ +--- +layout: model +title: Castilian, Spanish asr_whisper_tiny_spanish_arpagon WhisperForCTC from arpagon +author: John Snow Labs +name: asr_whisper_tiny_spanish_arpagon +date: 2023-10-20 +tags: [whisper, es, open_source, asr, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_spanish_arpagon` is a Castilian, Spanish model originally trained by arpagon. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_spanish_arpagon_es_5.1.4_3.4_1697761762014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_spanish_arpagon_es_5.1.4_3.4_1697761762014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + + +speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_spanish_arpagon","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new AudioAssembler() + .setInputCol("audio_content") + .setOutputCol("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_spanish_arpagon","es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_spanish_arpagon| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|390.7 MB| + +## References + +https://huggingface.co/arpagon/whisper-tiny-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_spanish_arpagon_pipeline_es.md b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_spanish_arpagon_pipeline_es.md new file mode 100644 index 00000000000000..93c188485a8dd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-10-20-asr_whisper_tiny_spanish_arpagon_pipeline_es.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Castilian, Spanish asr_whisper_tiny_spanish_arpagon_pipeline pipeline WhisperForCTC from arpagon +author: John Snow Labs +name: asr_whisper_tiny_spanish_arpagon_pipeline +date: 2023-10-20 +tags: [whisper, es, open_source, pipeline] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.1.4 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_whisper_tiny_spanish_arpagon_pipeline` is a Castilian, Spanish model originally trained by arpagon. + +This model is only compatible with PySpark 3.4 and above + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_spanish_arpagon_pipeline_es_5.1.4_3.4_1697761769281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_spanish_arpagon_pipeline_es_5.1.4_3.4_1697761769281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline('asr_whisper_tiny_spanish_arpagon_pipeline', lang = 'es') +annotations = pipeline.transform(audioDF) + +``` +```scala + +val pipeline = new PretrainedPipeline('asr_whisper_tiny_spanish_arpagon_pipeline', lang = 'es') +val annotations = pipeline.transform(audioDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_whisper_tiny_spanish_arpagon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.1.4+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|390.7 MB| + +## References + +https://huggingface.co/arpagon/whisper-tiny-es + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file