-
Notifications
You must be signed in to change notification settings - Fork 717
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
2024-11-01-distilbart_xsum_12_6_en (#14447)
* Add model 2024-11-01-distilbart_xsum_12_6_en * Add model 2024-11-03-gpt2_en * Add model 2024-11-08-hubert_ukrainian_uk * Add model 2024-11-08-hubert_ukrainian_pipeline_uk * Add model 2024-11-08-unitku_hubert_japanese_asr_ja * Add model 2024-11-08-unitku_hubert_japanese_asr_pipeline_ja * Add model 2024-11-08-hubert_large_japanese_asr_ja * Add model 2024-11-08-hubert_large_japanese_asr_pipeline_ja --------- Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com>
- Loading branch information
1 parent
d8d4736
commit 5a556ba
Showing
8 changed files
with
626 additions
and
0 deletions.
There are no files selected for viewing
74 changes: 74 additions & 0 deletions
74
docs/_posts/ahmedlone127/2024-11-01-distilbart_xsum_12_6_en.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
layout: model | ||
title: Abstractive Summarization by BART - DistilBART XSUM | ||
author: John Snow Labs | ||
name: distilbart_xsum_12_6 | ||
date: 2024-11-01 | ||
tags: [en, summarization, text_to_text, distil, open_source, openvino] | ||
task: Summarization | ||
language: en | ||
edition: Spark NLP 5.5.0 | ||
spark_version: 3.0 | ||
supported: true | ||
engine: openvino | ||
annotator: BartTransformer | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
“BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Transformer” The Facebook BART (Bidirectional and Auto-Regressive Transformer) model is a state-of-the-art language generation model that was introduced by Facebook AI in 2019. It is based on the transformer architecture and is designed to handle a wide range of natural language processing tasks such as text generation, summarization, and machine translation. | ||
|
||
This pre-trained model is DistilBART fine-tuned on the Extreme Summarization (XSum) Dataset. | ||
|
||
## Predicted Entities | ||
|
||
|
||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbart_xsum_12_6_en_5.5.0_3.0_1730492024334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbart_xsum_12_6_en_5.5.0_3.0_1730492024334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
bart = BartTransformer.pretrained("distilbart_xsum_12_6") \ | ||
.setTask("summarize:") \ | ||
.setMaxOutputLength(200) \ | ||
.setInputCols(["documents"]) \ | ||
.setOutputCol("summaries") | ||
``` | ||
```scala | ||
val bart = BartTransformer.pretrained("distilbart_xsum_12_6") | ||
.setTask("summarize:") | ||
.setMaxOutputLength(200) | ||
.setInputCols("documents") | ||
.setOutputCol("summaries") | ||
``` | ||
</div> | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|distilbart_xsum_12_6| | ||
|Compatibility:|Spark NLP 5.5.0+| | ||
|License:|Open Source| | ||
|Edition:|Official| | ||
|Input Labels:|[documents]| | ||
|Output Labels:|[generation]| | ||
|Language:|en| | ||
|Size:|853.7 MB| | ||
|
||
## References | ||
|
||
https://huggingface.co/sshleifer/distilbart-xsum-12-6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
--- | ||
layout: model | ||
title: GPT2 text-to-text model (Base) | ||
author: John Snow Labs | ||
name: gpt2 | ||
date: 2024-11-03 | ||
tags: [gpt2, en, open_source, onnx, openvino] | ||
task: Text Generation | ||
language: en | ||
edition: Spark NLP 5.5.0 | ||
spark_version: 3.0 | ||
supported: true | ||
engine: openvino | ||
annotator: GPT2Transformer | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
“GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where the model is primed with an input and it generates a lengthy continuation. | ||
|
||
## Predicted Entities | ||
|
||
|
||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gpt2_en_5.5.0_3.0_1730653115205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gpt2_en_5.5.0_3.0_1730653115205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
documentAssembler = DocumentAssembler() \ | ||
.setInputCol("text") \ | ||
.setOutputCol("documents") | ||
|
||
gpt2 = GPT2Transformer.pretrained("gpt2") \ | ||
.setInputCols(["documents"]) \ | ||
.setMaxOutputLength(50) \ | ||
.setOutputCol("generation") | ||
|
||
pipeline = Pipeline().setStages([documentAssembler, gpt2]) | ||
data = spark.createDataFrame([["My name is Leonardo."]]).toDF("text") | ||
result = pipeline.fit(data).transform(data) | ||
result.select("summaries.generation").show(truncate=False) | ||
``` | ||
```scala | ||
val documentAssembler = new DocumentAssembler() | ||
.setInputCol("text") | ||
.setOutputCol("documents") | ||
|
||
val gpt2 = GPT2Transformer.pretrained("gpt2") | ||
.setInputCols(Array("documents")) | ||
.setMinOutputLength(10) | ||
.setMaxOutputLength(50) | ||
.setDoSample(false) | ||
.setTopK(50) | ||
.setNoRepeatNgramSize(3) | ||
.setOutputCol("generation") | ||
|
||
val pipeline = new Pipeline().setStages(Array(documentAssembler, gpt2)) | ||
|
||
val data = Seq("My name is Leonardo.").toDF("text") | ||
val result = pipeline.fit(data).transform(data) | ||
results.select("generation.result").show(truncate = false) | ||
``` | ||
</div> | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|gpt2| | ||
|Compatibility:|Spark NLP 5.5.0+| | ||
|License:|Open Source| | ||
|Edition:|Official| | ||
|Input Labels:|[documents]| | ||
|Output Labels:|[generation]| | ||
|Language:|en| | ||
|Size:|467.4 MB| | ||
|
||
## References | ||
|
||
https://huggingface.co/openai-community/gpt2 |
84 changes: 84 additions & 0 deletions
84
docs/_posts/ahmedlone127/2024-11-08-hubert_large_japanese_asr_ja.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
layout: model | ||
title: Japanese hubert_large_japanese_asr HubertForCTC from TKU410410103 | ||
author: John Snow Labs | ||
name: hubert_large_japanese_asr | ||
date: 2024-11-08 | ||
tags: [ja, open_source, onnx, asr, hubert] | ||
task: Automatic Speech Recognition | ||
language: ja | ||
edition: Spark NLP 5.5.1 | ||
spark_version: 3.0 | ||
supported: true | ||
engine: onnx | ||
annotator: HubertForCTC | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
Pretrained HubertForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hubert_large_japanese_asr` is a Japanese model originally trained by TKU410410103. | ||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hubert_large_japanese_asr_ja_5.5.1_3.0_1731106819898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hubert_large_japanese_asr_ja_5.5.1_3.0_1731106819898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
|
||
audioAssembler = AudioAssembler() \ | ||
.setInputCol("audio_content") \ | ||
.setOutputCol("audio_assembler") | ||
|
||
speechToText = HubertForCTC.pretrained("hubert_large_japanese_asr","ja") \ | ||
.setInputCols(["audio_assembler"]) \ | ||
.setOutputCol("text") | ||
|
||
pipeline = Pipeline().setStages([audioAssembler, speechToText]) | ||
pipelineModel = pipeline.fit(data) | ||
pipelineDF = pipelineModel.transform(data) | ||
|
||
``` | ||
```scala | ||
|
||
val audioAssembler = new DocumentAssembler() | ||
.setInputCols("audio_content") | ||
.setOutputCols("audio_assembler") | ||
|
||
val speechToText = HubertForCTC.pretrained("hubert_large_japanese_asr", "ja") | ||
.setInputCols(Array("audio_assembler")) | ||
.setOutputCol("text") | ||
|
||
val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) | ||
val pipelineModel = pipeline.fit(data) | ||
val pipelineDF = pipelineModel.transform(data) | ||
|
||
``` | ||
</div> | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|hubert_large_japanese_asr| | ||
|Compatibility:|Spark NLP 5.5.1+| | ||
|License:|Open Source| | ||
|Edition:|Official| | ||
|Input Labels:|[audio_assembler]| | ||
|Output Labels:|[text]| | ||
|Language:|ja| | ||
|Size:|2.4 GB| | ||
|
||
## References | ||
|
||
https://huggingface.co/TKU410410103/hubert-large-japanese-asr |
69 changes: 69 additions & 0 deletions
69
docs/_posts/ahmedlone127/2024-11-08-hubert_large_japanese_asr_pipeline_ja.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
layout: model | ||
title: Japanese hubert_large_japanese_asr_pipeline pipeline HubertForCTC from TKU410410103 | ||
author: John Snow Labs | ||
name: hubert_large_japanese_asr_pipeline | ||
date: 2024-11-08 | ||
tags: [ja, open_source, pipeline, onnx] | ||
task: Automatic Speech Recognition | ||
language: ja | ||
edition: Spark NLP 5.5.1 | ||
spark_version: 3.0 | ||
supported: true | ||
annotator: PipelineModel | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
Pretrained HubertForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hubert_large_japanese_asr_pipeline` is a Japanese model originally trained by TKU410410103. | ||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hubert_large_japanese_asr_pipeline_ja_5.5.1_3.0_1731106937966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hubert_large_japanese_asr_pipeline_ja_5.5.1_3.0_1731106937966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
|
||
pipeline = PretrainedPipeline("hubert_large_japanese_asr_pipeline", lang = "ja") | ||
annotations = pipeline.transform(df) | ||
|
||
``` | ||
```scala | ||
|
||
val pipeline = new PretrainedPipeline("hubert_large_japanese_asr_pipeline", lang = "ja") | ||
val annotations = pipeline.transform(df) | ||
|
||
``` | ||
</div> | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|hubert_large_japanese_asr_pipeline| | ||
|Type:|pipeline| | ||
|Compatibility:|Spark NLP 5.5.1+| | ||
|License:|Open Source| | ||
|Edition:|Official| | ||
|Language:|ja| | ||
|Size:|2.4 GB| | ||
|
||
## References | ||
|
||
https://huggingface.co/TKU410410103/hubert-large-japanese-asr | ||
|
||
## Included Models | ||
|
||
- AudioAssembler | ||
- HubertForCTC |
69 changes: 69 additions & 0 deletions
69
docs/_posts/ahmedlone127/2024-11-08-hubert_ukrainian_pipeline_uk.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
layout: model | ||
title: Ukrainian hubert_ukrainian_pipeline pipeline HubertForCTC from Yehor | ||
author: John Snow Labs | ||
name: hubert_ukrainian_pipeline | ||
date: 2024-11-08 | ||
tags: [uk, open_source, pipeline, onnx] | ||
task: Automatic Speech Recognition | ||
language: uk | ||
edition: Spark NLP 5.5.1 | ||
spark_version: 3.0 | ||
supported: true | ||
annotator: PipelineModel | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
Pretrained HubertForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hubert_ukrainian_pipeline` is a Ukrainian model originally trained by Yehor. | ||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hubert_ukrainian_pipeline_uk_5.5.1_3.0_1731106461400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hubert_ukrainian_pipeline_uk_5.5.1_3.0_1731106461400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
|
||
pipeline = PretrainedPipeline("hubert_ukrainian_pipeline", lang = "uk") | ||
annotations = pipeline.transform(df) | ||
|
||
``` | ||
```scala | ||
|
||
val pipeline = new PretrainedPipeline("hubert_ukrainian_pipeline", lang = "uk") | ||
val annotations = pipeline.transform(df) | ||
|
||
``` | ||
</div> | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|hubert_ukrainian_pipeline| | ||
|Type:|pipeline| | ||
|Compatibility:|Spark NLP 5.5.1+| | ||
|License:|Open Source| | ||
|Edition:|Official| | ||
|Language:|uk| | ||
|Size:|708.6 MB| | ||
|
||
## References | ||
|
||
https://huggingface.co/Yehor/hubert-uk | ||
|
||
## Included Models | ||
|
||
- AudioAssembler | ||
- HubertForCTC |
Oops, something went wrong.