Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2023-06-26-distilbert_embeddings_finetuned_sarcasm_classification_en #13867

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
d669e44
Add model 2023-06-26-distilbert_embeddings_finetuned_sarcasm_classifi…
ahmedlone127 Jun 26, 2023
202e653
Add model 2023-06-26-distilbert_embeddings_distilbert_base_indonesian_id
ahmedlone127 Jun 26, 2023
8531980
Add model 2023-06-26-distilbert_embeddings_BERTino_it
ahmedlone127 Jun 26, 2023
354e451
Add model 2023-06-26-distilbert_embeddings_distilbert_base_uncased_sp…
ahmedlone127 Jun 26, 2023
008bce5
Add model 2023-06-26-distilbert_embeddings_malaysian_distilbert_small_ms
ahmedlone127 Jun 26, 2023
0fa8945
Add model 2023-06-26-distilbert_embeddings_distilbert_fa_zwnj_base_fa
ahmedlone127 Jun 26, 2023
17b7ee6
Add model 2023-06-26-distilbert_embeddings_javanese_distilbert_small_jv
ahmedlone127 Jun 26, 2023
5635532
Add model 2023-06-26-distilbert_embeddings_javanese_distilbert_small_…
ahmedlone127 Jun 26, 2023
b56dc67
Add model 2023-06-26-distilbert_embeddings_indic_transformers_hi_dist…
ahmedlone127 Jun 26, 2023
538a888
Add model 2023-06-26-distilbert_embeddings_marathi_distilbert_mr
ahmedlone127 Jun 26, 2023
f10faa4
Add model 2023-06-26-distilbert_embeddings_indic_transformers_bn_dist…
ahmedlone127 Jun 26, 2023
f0aa8f0
Add model 2023-06-26-distilbert_embeddings_distilbert_base_uncased_sp…
ahmedlone127 Jun 26, 2023
ffab352
Add model 2023-06-26-deberta_embeddings_xsmall_dapt_scientific_papers…
ahmedlone127 Jun 26, 2023
88c3df0
Add model 2023-06-26-deberta_embeddings_spm_vie_vie
ahmedlone127 Jun 26, 2023
6382de2
Add model 2023-06-26-deberta_embeddings_vie_small_vie
ahmedlone127 Jun 26, 2023
77addbe
Add model 2023-06-26-deberta_embeddings_tapt_nbme_v3_base_en
ahmedlone127 Jun 26, 2023
d35a166
Add model 2023-06-26-deberta_embeddings_erlangshen_v2_chinese_sentenc…
ahmedlone127 Jun 26, 2023
e6ed418
Add model 2023-06-26-deberta_v3_xsmall_en
ahmedlone127 Jun 26, 2023
7392fca
Add model 2023-06-26-deberta_embeddings_mlm_test_en
ahmedlone127 Jun 26, 2023
bb11af2
Add model 2023-06-26-deberta_v3_small_en
ahmedlone127 Jun 26, 2023
b9e8f46
Add model 2023-06-26-roberta_base_swiss_legal_gsw
ahmedlone127 Jun 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
layout: model
title: Chinese Deberta Embeddings Cased model (from IDEA-CCNL)
author: John Snow Labs
name: deberta_embeddings_erlangshen_v2_chinese_sentencepiece
date: 2023-06-26
tags: [open_source, deberta, deberta_embeddings, debertav2formaskedlm, zh, onnx]
task: Embeddings
language: zh
edition: Spark NLP 5.0.0
spark_version: 3.0
supported: true
engine: onnx
annotator: DeBertaEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained DebertaV2ForMaskedLM model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `Erlangshen-DeBERTa-v2-186M-Chinese-SentencePiece` is a Chinese model originally trained by `IDEA-CCNL`.

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_embeddings_erlangshen_v2_chinese_sentencepiece_zh_5.0.0_3.0_1687781761029.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_embeddings_erlangshen_v2_chinese_sentencepiece_zh_5.0.0_3.0_1687781761029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_erlangshen_v2_chinese_sentencepiece","zh") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings") \
.setCaseSensitive(True)

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["I love Spark-NLP"]]).toDF("text")

result = pipeline.fit(data).transform(data)
```
```scala
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")

val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_erlangshen_v2_chinese_sentencepiece","zh")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")
.setCaseSensitive(True)

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("I love Spark-NLP").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
```
</div>

{:.model-param}

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_erlangshen_v2_chinese_sentencepiece","zh") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings") \
.setCaseSensitive(True)

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["I love Spark-NLP"]]).toDF("text")

result = pipeline.fit(data).transform(data)
```
```scala
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")

val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_erlangshen_v2_chinese_sentencepiece","zh")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")
.setCaseSensitive(True)

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("I love Spark-NLP").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|deberta_embeddings_erlangshen_v2_chinese_sentencepiece|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[sentence, token]|
|Output Labels:|[embeddings]|
|Language:|zh|
|Size:|443.8 MB|
|Case sensitive:|false|
140 changes: 140 additions & 0 deletions docs/_posts/ahmedlone127/2023-06-26-deberta_embeddings_mlm_test_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
layout: model
title: English Deberta Embeddings model (from domenicrosati)
author: John Snow Labs
name: deberta_embeddings_mlm_test
date: 2023-06-26
tags: [deberta, open_source, deberta_embeddings, debertav2formaskedlm, en, onnx]
task: Embeddings
language: en
edition: Spark NLP 5.0.0
spark_version: 3.0
supported: true
engine: onnx
annotator: DeBertaEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained DebertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `deberta-mlm-test` is a English model originally trained by `domenicrosati`.

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_embeddings_mlm_test_en_5.0.0_3.0_1687782209221.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_embeddings_mlm_test_en_5.0.0_3.0_1687782209221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_mlm_test","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings") \
.setCaseSensitive(True)

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text")

result = pipeline.fit(data).transform(data)
```
```scala
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")

val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_mlm_test","en")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")
.setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("I love Spark NLP").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
```
</div>

{:.model-param}

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_mlm_test","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings") \
.setCaseSensitive(True)

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text")

result = pipeline.fit(data).transform(data)
```
```scala
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")

val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_mlm_test","en")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")
.setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("I love Spark NLP").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|deberta_embeddings_mlm_test|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[sentence, token]|
|Output Labels:|[embeddings]|
|Language:|en|
|Size:|265.4 MB|
|Case sensitive:|false|
Loading