Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models hub #14006

Merged
merged 61 commits into from
Sep 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
57d855e
Merge branch 'master' into models_hub
maziyarpanahi Nov 21, 2022
41cda2d
Merge branch 'models_hub' of https://github.com/JohnSnowLabs/spark-nl…
maziyarpanahi Nov 25, 2022
6c39602
Merge branch 'master' into models_hub
maziyarpanahi Dec 15, 2022
bed4adb
Merge branch 'master' into models_hub
maziyarpanahi Dec 21, 2022
cf0b08f
Merge branch 'master' into models_hub
maziyarpanahi Feb 7, 2023
93d6753
Merge branch 'master' into models_hub
maziyarpanahi Mar 14, 2023
afb700e
Add model 2023-04-13-CyberbullyingDetection_ClassifierDL_tfhub_en (#1…
jsl-models Apr 13, 2023
bb9a155
2023-04-20-distilbert_base_uncased_mnli_en (#13761)
jsl-models Apr 20, 2023
ea0ba05
2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_multinl…
jsl-models Apr 21, 2023
9afffb1
2023-05-04-roberta_base_zero_shot_classifier_nli_en (#13781)
jsl-models May 4, 2023
f4356e5
2023-05-09-distilbart_xsum_6_6_en (#13788)
jsl-models May 10, 2023
04149fb
Merge branch 'master' into models_hub
maziyarpanahi May 10, 2023
de3e19e
2023-05-11-distilbart_cnn_12_6_en (#13795)
jsl-models May 11, 2023
71de0f7
2023-05-19-match_pattern_en (#13805)
jsl-models May 21, 2023
f28ea8e
2023-05-22-explain_document_md_fr (#13811)
jsl-models May 23, 2023
4049881
2023-05-24-explain_document_md_fr (#13821)
jsl-models May 25, 2023
e4e465e
Add model 2023-05-25-explain_document_md_fr (#13827)
jsl-models May 25, 2023
e8e01a5
2023-05-25-dependency_parse_en (#13828)
jsl-models May 26, 2023
9c0a24e
Merge branch 'master' into models_hub
maziyarpanahi May 26, 2023
2fd64c3
2023-05-25-distilcamembert_french_legal_fr (#13826)
jsl-models May 26, 2023
795ebf8
Update title for 2023-05-25-distilcamembert_french_legal_fr.md (#13831)
Mary-Sci May 26, 2023
c04ca51
2023-05-27-explain_document_md_fr (#13836)
jsl-models May 27, 2023
4d64d1b
2023-05-28-longformer_base_english_legal_en (#13838)
jsl-models May 28, 2023
02a9afb
2023-05-28-xlm_longformer_base_english_legal_en (#13839)
jsl-models May 29, 2023
d054074
2023-06-21-bert_embeddings_distil_clinical_en (#13861)
jsl-models Jun 21, 2023
43ab794
2023-06-26-distilbert_embeddings_finetuned_sarcasm_classification_en …
jsl-models Jun 26, 2023
7cde44f
2023-06-27-roberta_embeddings_robertinh_gl (#13868)
jsl-models Jun 27, 2023
ced98b6
Add model 2023-06-29-xlmroberta_embeddings_paraphrase_mpnet_base_v2_x…
jsl-models Jun 30, 2023
dfaabd4
2023-06-08-instructor_base_en (#13850)
jsl-models Jul 1, 2023
59113cd
2023-06-28-roberta_base_en (#13871)
jsl-models Jul 1, 2023
740f4fb
Merge branch 'master' into models_hub
maziyarpanahi Jul 3, 2023
c999bd6
Merge branch 'master' into models_hub
maziyarpanahi Jul 4, 2023
27840ed
Add model 2023-07-05-image_classifier_convnext_tiny_224_local_en (#13…
jsl-models Jul 5, 2023
566b6ee
Add model 2023-07-06-quora_distilbert_multilingual_en (#13882)
jsl-models Jul 18, 2023
d246455
removed duplicated sections (#13885)
ahmedlone127 Jul 18, 2023
182bc05
Add model 2023-07-20-xlm_roberta_large_zero_shot_classifier_xnli_anli…
jsl-models Jul 21, 2023
9a1bea5
Add model 2023-07-28-twitter_xlm_roberta_base_sentiment_en (#13905)
jsl-models Jul 28, 2023
cc00383
2023-07-30-albert_embeddings_ALR_BERT_ro (#13910)
jsl-models Aug 2, 2023
b6d3cf1
2023-07-28-twitter_xlm_roberta_base_sentiment_en (#13906)
jsl-models Aug 2, 2023
0504fb7
2023-08-07-bart_large_zero_shot_classifier_mnli_en (#13917)
jsl-models Aug 7, 2023
1a0f376
2023-08-15-gte_base_en (#13922)
jsl-models Aug 15, 2023
0e2bb83
2023-08-15-bge_small_en (#13923)
jsl-models Aug 15, 2023
a11908a
2023-08-18-mpnet_embedding_mpnet_snli_en (#13929)
jsl-models Aug 24, 2023
06f07da
2023-08-22-asr_whisper_tiny_opt_xx (#13931)
jsl-models Aug 24, 2023
b1b99f5
2023-08-25-e5_small_en (#13939)
jsl-models Aug 25, 2023
b891455
2023-08-25-e5_large_v2_opt_en (#13941)
jsl-models Aug 25, 2023
2f27b9a
2023-08-29-mpnet_embedding_tiny_random_mpnet_by_hf_internal_testing_e…
jsl-models Aug 29, 2023
da67ab7
Merge branch 'master' into models_hub
maziyarpanahi Sep 6, 2023
ae1e24f
2023-08-28-asr_whisper_tiny_opt_xx (#13944)
jsl-models Sep 7, 2023
b1da33e
2023-09-07-java_pointer_classifier_en (#13968)
jsl-models Sep 8, 2023
16c83c2
2023-09-09-medium_mlm_imdb_en (#13970)
jsl-models Sep 11, 2023
f3c878e
2023-09-12-tiny_mlm_glue_rte_en (#13975)
jsl-models Sep 13, 2023
6ec2297
2023-09-13-tlm_ag_small_scale_en (#13980)
jsl-models Sep 13, 2023
d6f3fe5
2023-09-13-bert_base_uncased_issues_128_juandeun_en (#13981)
jsl-models Sep 14, 2023
0f86237
2023-09-14-bert_base_cased_finetuned_wallisian_manual_9ep_en (#13982)
jsl-models Sep 15, 2023
d9fdbf0
2023-09-15-distilbert_base_german_cased_de (#13984)
jsl-models Sep 16, 2023
6ee008b
2023-09-18-m3_experiment_albert_base_v2_tweet_eval_hate_word_swapping…
jsl-models Sep 20, 2023
577ecb6
Add model 2023-09-18-AtgxRobertaBaseSquad2_en (#13988)
jsl-models Sep 25, 2023
d4bd550
2023-09-20-image_captioning_vit_gpt2_en (#13999)
jsl-models Sep 25, 2023
00e0a8d
2023-09-21-multilingual_e5_base_xx (#14002)
jsl-models Sep 25, 2023
2fd633a
2023-09-22-bert_embeddings_frpile_gpl_en (#14003)
jsl-models Sep 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
119 changes: 119 additions & 0 deletions docs/_posts/DevinTDHa/2023-08-28-asr_whisper_tiny_opt_xx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
layout: model
title: Official whisper-tiny Optimized
author: John Snow Labs
name: asr_whisper_tiny_opt
date: 2023-08-28
tags: [whisper, audio, open_source, asr, onnx, xx]
task: Automatic Speech Recognition
language: xx
edition: Spark NLP 5.1.1
spark_version: 3.0
supported: true
engine: onnx
annotator: WhisperForCTC
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Official pretrained Whisper model, adapted from HuggingFace transformer and curated to provide scalability and production-readiness using Spark NLP.

This is a multilingual model and supports the following languages:

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_opt_xx_5.1.1_3.0_1693213918398.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_opt_xx_5.1.1_3.0_1693213918398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

audioAssembler = AudioAssembler() \
.setInputCol("audio_content") \
.setOutputCol("audio_assembler")

speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_opt", "xx") \
.setInputCols(["audio_assembler"]) \
.setOutputCol("text")

pipeline = Pipeline().setStages([audioAssembler, speechToText])
processedAudioFloats = spark.createDataFrame([[rawFloats]]).toDF("audio_content")
result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
result.select("text.result").show(truncate = False)
```
```scala
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotators._
import com.johnsnowlabs.nlp.annotators.audio.WhisperForCTC
import org.apache.spark.ml.Pipeline

val audioAssembler: AudioAssembler = new AudioAssembler()
.setInputCol("audio_content")
.setOutputCol("audio_assembler")

val speechToText: WhisperForCTC = WhisperForCTC
.pretrained("asr_whisper_tiny_opt", "xx")
.setInputCols("audio_assembler")
.setOutputCol("text")

val pipeline: Pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))

val bufferedSource =
scala.io.Source.fromFile("src/test/resources/audio/txt/librispeech_asr_0.txt")

val rawFloats = bufferedSource
.getLines()
.map(_.split(",").head.trim.toFloat)
.toArray
bufferedSource.close

val processedAudioFloats = Seq(rawFloats).toDF("audio_content")

val result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
result.select("text.result").show(truncate = false)
```
</div>

## Results

```bash
+------------------------------------------------------------------------------------------------------------------------------------------------+
|document |
+------------------------------------------------------------------------------------------------------------------------------------------------+
|[{document, 0, 87, Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel., {length -> 93680, audio -> 0}, []}]|
+------------------------------------------------------------------------------------------------------------------------------------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|asr_whisper_tiny_opt|
|Compatibility:|Spark NLP 5.1.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[audio_assembler]|
|Output Labels:|[document]|
|Language:|xx|
|Size:|239.3 MB|
125 changes: 125 additions & 0 deletions docs/_posts/DevinTDHa/2023-09-20-image_captioning_vit_gpt2_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
layout: model
title: Image Caption with VisionEncoderDecoder ViT GPT2
author: John Snow Labs
name: image_captioning_vit_gpt2
date: 2023-09-20
tags: [en, vit, gpt2, image, captioning, open_source, tensorflow]
task: Image Captioning
language: en
edition: Spark NLP 5.1.2
spark_version: 3.0
supported: true
engine: tensorflow
annotator: VisionEncoderDecoderForImageCaptioning
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is an image captioning model using ViT to encode images and GPT2 to generate captions. Original model from https://huggingface.co/nlpconnect/vit-gpt2-image-captioning

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/image_captioning_vit_gpt2_en_5.1.2_3.0_1695215721202.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/image_captioning_vit_gpt2_en_5.1.2_3.0_1695215721202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
imageDF = spark.read \
.format("image") \
.option("dropInvalid", value = True) \
.load("src/test/resources/image/")
imageAssembler = ImageAssembler() \
.setInputCol("image") \
.setOutputCol("image_assembler")
imageCaptioning = VisionEncoderDecoderForImageCaptioning \
.pretrained() \
.setBeamSize(2) \
.setDoSample(False) \
.setInputCols(["image_assembler"]) \
.setOutputCol("caption")
pipeline = Pipeline().setStages([imageAssembler, imageCaptioning])
pipelineDF = pipeline.fit(imageDF).transform(imageDF)
pipelineDF \
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "caption.result") \
.show(truncate = False)
```
```scala
import com.johnsnowlabs.nlp.annotator._
import com.johnsnowlabs.nlp.ImageAssembler
import org.apache.spark.ml.Pipeline

val imageDF: DataFrame = spark.read
.format("image")
.option("dropInvalid", value = true)
.load("src/test/resources/image/")

val imageCaptioning = new ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")

val imageClassifier = VisionEncoderDecoderForImageCaptioning
.pretrained()
.setBeamSize(2)
.setDoSample(false)
.setInputCols("image_assembler")
.setOutputCol("caption")

val pipeline = new Pipeline().setStages(Array(imageAssembler, imageCaptioning))
val pipelineDF = pipeline.fit(imageDF).transform(imageDF)

pipelineDF
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "caption.result")
.show(truncate = false)
```
</div>

## Results

```bash
+-----------------+---------------------------------------------------------+
|image_name |result |
+-----------------+---------------------------------------------------------+
|palace.JPEG |[a large room filled with furniture and a large window] |
|egyptian_cat.jpeg|[a cat laying on a couch next to another cat] |
|hippopotamus.JPEG|[a brown bear in a body of water] |
|hen.JPEG |[a flock of chickens standing next to each other] |
|ostrich.JPEG |[a large bird standing on top of a lush green field] |
|junco.JPEG |[a small bird standing on a wet ground] |
|bluetick.jpg |[a small dog standing on a wooden floor] |
|chihuahua.jpg |[a small brown dog wearing a blue sweater] |
|tractor.JPEG |[a man is standing in a field with a tractor] |
|ox.JPEG |[a large brown cow standing on top of a lush green field]|
+-----------------+---------------------------------------------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|image_captioning_vit_gpt2|
|Compatibility:|Spark NLP 5.1.2+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[image_assembler]|
|Output Labels:|[caption]|
|Language:|en|
|Size:|890.3 MB|
61 changes: 61 additions & 0 deletions docs/_posts/LIN-Yu-Ting/2023-09-18-AtgxRobertaBaseSquad2_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
layout: model
title: Atgenomix Testing QA Model
author: LIN-Yu-Ting
name: AtgxRobertaBaseSquad2
date: 2023-09-18
tags: [en, open_source, tensorflow]
task: Question Answering
language: en
edition: Spark NLP 4.4.3
spark_version: 3.4
supported: false
engine: tensorflow
annotator: RoBertaForQuestionAnswering
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Testing Question Answering model for Atgenomix usage

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/community.johnsnowlabs.com/LIN-Yu-Ting/AtgxRobertaBaseSquad2_en_4.4.3_3.4_1695000774804.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://community.johnsnowlabs.com/LIN-Yu-Ting/AtgxRobertaBaseSquad2_en_4.4.3_3.4_1695000774804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
import sparknlp
spark = sparknlp.start()
```

</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|AtgxRobertaBaseSquad2|
|Compatibility:|Spark NLP 4.4.3+|
|License:|Open Source|
|Edition:|Community|
|Input Labels:|[document_question, document_context]|
|Output Labels:|[answer]|
|Language:|en|
|Size:|460.0 MB|
|Case sensitive:|true|
|Max sentence length:|512|
Loading