Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2023-06-21-bert_embeddings_distil_clinical_en #13861

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
f7e848e
Add model 2023-06-21-bert_embeddings_distil_clinical_en
ahmedlone127 Jun 21, 2023
c5cfc17
Add model 2023-06-21-bert_embeddings_carlbert_webex_mlm_spatial_en
ahmedlone127 Jun 21, 2023
2231806
Add model 2023-06-21-bert_embeddings_chemical_uncased_finetuned_cust_…
ahmedlone127 Jun 21, 2023
2e406fe
Add model 2023-06-21-bert_embeddings_lsg16k_Italian_Legal_it
ahmedlone127 Jun 21, 2023
a2183c9
Add model 2023-06-21-bert_embeddings_chemical_uncased_finetuned_cust_…
ahmedlone127 Jun 21, 2023
ee25899
Add model 2023-06-21-bert_embeddings_legalbert_adept_en
ahmedlone127 Jun 21, 2023
8336ae4
Add model 2023-06-21-bert_embeddings_base_uncased_issues_128_en
ahmedlone127 Jun 21, 2023
9c59610
Add model 2023-06-21-bert_embeddings_pretrain_ko
ahmedlone127 Jun 21, 2023
159c906
Add model 2023-06-21-bert_embeddings_olm_base_uncased_oct_2022_en
ahmedlone127 Jun 21, 2023
5bf420d
Add model 2023-06-21-legalectra_small_es
ahmedlone127 Jun 21, 2023
55f9f3f
Add model 2023-06-21-biobert_pubmed_base_cased_v1.2_en
ahmedlone127 Jun 21, 2023
22d5aa6
Add model 2023-06-21-bert_embeddings_jobbert_base_cased_en
ahmedlone127 Jun 21, 2023
cb177d6
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_700000_c…
ahmedlone127 Jun 21, 2023
649b171
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_800000_c…
ahmedlone127 Jun 21, 2023
bb4310b
Add model 2023-06-21-legalectra_base_es
ahmedlone127 Jun 21, 2023
eda21c7
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_900000_c…
ahmedlone127 Jun 21, 2023
2ca91e5
Add model 2023-06-21-bert_embeddings_scibert_scivocab_finetuned_cord1…
ahmedlone127 Jun 21, 2023
83bde35
Add model 2023-06-21-bert_embeddings_InLegalBERT_en
ahmedlone127 Jun 21, 2023
dfd25ab
Add model 2023-06-21-bert_embeddings_InCaseLawBERT_en
ahmedlone127 Jun 21, 2023
c39d6e6
Add model 2023-06-21-bert_base_uncased_contracts_en
ahmedlone127 Jun 21, 2023
e965624
Add model 2023-06-21-electra_embeddings_electra_base_turkish_mc4_unca…
ahmedlone127 Jun 21, 2023
7ae0044
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_500000_c…
ahmedlone127 Jun 21, 2023
1b3351a
Add model 2023-06-21-electra_embeddings_electra_base_generator_en
ahmedlone127 Jun 21, 2023
77847d2
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_200000_c…
ahmedlone127 Jun 21, 2023
f58dde9
Add model 2023-06-21-electra_embeddings_electra_base_italian_xxl_case…
ahmedlone127 Jun 21, 2023
0182101
Add model 2023-06-21-bert_embeddings_bioclinicalbert_finetuned_covid_…
ahmedlone127 Jun 21, 2023
88663c7
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_1000000_…
ahmedlone127 Jun 21, 2023
0534c58
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_600000_c…
ahmedlone127 Jun 21, 2023
3c13d84
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_400000_c…
ahmedlone127 Jun 21, 2023
662939a
Add model 2023-06-21-electra_embeddings_finance_koelectra_base_genera…
ahmedlone127 Jun 21, 2023
795450e
Add model 2023-06-21-electra_embeddings_koelectra_base_v2_generator_ko
ahmedlone127 Jun 21, 2023
0658533
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_300000_c…
ahmedlone127 Jun 21, 2023
4ee0f36
Add model 2023-06-21-electra_embeddings_electra_base_turkish_mc4_case…
ahmedlone127 Jun 21, 2023
bcdb2d7
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_0_cased_…
ahmedlone127 Jun 21, 2023
c28e4e4
Add model 2023-06-21-electra_embeddings_electra_small_generator_en
ahmedlone127 Jun 21, 2023
5f109a9
Add model 2023-06-21-electra_embeddings_electra_large_generator_en
ahmedlone127 Jun 21, 2023
01654d8
Add model 2023-06-21-electra_embeddings_electricidad_base_generator_es
ahmedlone127 Jun 21, 2023
9304071
Add model 2023-06-21-electra_embeddings_gelectra_large_generator_de
ahmedlone127 Jun 21, 2023
6f7acf7
Add model 2023-06-21-electra_embeddings_koelectra_base_generator_ko
ahmedlone127 Jun 21, 2023
2f82152
Add model 2023-06-21-electra_embeddings_koelectra_base_v3_generator_ko
ahmedlone127 Jun 21, 2023
c57578a
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_0_cased_…
ahmedlone127 Jun 21, 2023
285a2e1
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_100000_c…
ahmedlone127 Jun 21, 2023
5527e0a
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_400000_c…
ahmedlone127 Jun 21, 2023
438e6ee
Add model 2023-06-21-electra_embeddings_electra_base_gc4_64k_600000_c…
ahmedlone127 Jun 21, 2023
2b4c024
Add model 2023-06-21-electra_embeddings_electra_tagalog_small_cased_g…
ahmedlone127 Jun 21, 2023
160797d
Add model 2023-06-21-electra_embeddings_gelectra_base_generator_de
ahmedlone127 Jun 21, 2023
2247fb4
Add model 2023-06-21-electra_embeddings_electra_tagalog_base_cased_ge…
ahmedlone127 Jun 21, 2023
ad9e9a2
Add model 2023-06-21-bert_sentence_embeddings_financial_de
ahmedlone127 Jun 21, 2023
1ab946a
Add model 2023-06-21-electra_embeddings_electra_small_japanese_genera…
ahmedlone127 Jun 21, 2023
784513b
Add model 2023-06-21-electra_embeddings_electra_tagalog_base_uncased_…
ahmedlone127 Jun 21, 2023
52d8073
Add model 2023-06-21-electra_embeddings_koelectra_small_generator_ko
ahmedlone127 Jun 21, 2023
8f7245c
Add model 2023-06-21-electra_embeddings_finance_koelectra_small_gener…
ahmedlone127 Jun 21, 2023
86cb08b
Add model 2023-06-21-bert_embeddings_sec_bert_base_en
ahmedlone127 Jun 21, 2023
054823d
Add model 2023-06-21-electra_embeddings_kr_electra_generator_ko
ahmedlone127 Jun 21, 2023
a9d7101
Add model 2023-06-21-bert_embeddings_sec_bert_sh_en
ahmedlone127 Jun 21, 2023
027d7b9
Add model 2023-06-21-bert_embeddings_german_financial_statements_bert_de
ahmedlone127 Jun 21, 2023
5ee0ec9
Add model 2023-06-21-electra_embeddings_electra_tagalog_small_uncased…
ahmedlone127 Jun 21, 2023
b35ee1e
Add model 2023-06-21-bert_embeddings_javanese_bert_small_jv
ahmedlone127 Jun 21, 2023
3c8f957
Add model 2023-06-21-bert_embeddings_finest_bert_en
ahmedlone127 Jun 21, 2023
6f4b142
Add model 2023-06-21-bert_embeddings_indic_transformers_te_bert_te
ahmedlone127 Jun 21, 2023
85e7b30
Add model 2023-06-21-bert_embeddings_gbert_base_de
ahmedlone127 Jun 21, 2023
39aaf32
Add model 2023-06-21-bert_embeddings_indic_transformers_hi_bert_hi
ahmedlone127 Jun 21, 2023
d99bad4
Add model 2023-06-21-bert_embeddings_hateBERT_en
ahmedlone127 Jun 21, 2023
9860532
Add model 2023-06-21-bert_embeddings_false_positives_scancode_bert_ba…
ahmedlone127 Jun 21, 2023
e5229fd
Add model 2023-06-21-bert_embeddings_finbert_pretrain_yiyanghkust_en
ahmedlone127 Jun 21, 2023
75a9d50
Add model 2023-06-21-bert_embeddings_indic_transformers_te_bert_te
ahmedlone127 Jun 21, 2023
02c3097
Add model 2023-06-21-bert_embeddings_hseBert_it_cased_it
ahmedlone127 Jun 21, 2023
85b50bf
Add model 2023-06-21-bert_embeddings_finbert_pretrain_yiyanghkust_en
ahmedlone127 Jun 21, 2023
95b2663
Add model 2023-06-21-bert_embeddings_dpr_spanish_question_encoder_all…
ahmedlone127 Jun 21, 2023
7d73b10
Add model 2023-06-21-bert_embeddings_dziribert_ar
ahmedlone127 Jun 21, 2023
0b7f030
Add model 2023-06-21-bert_embeddings_deberta_base_uncased_en
ahmedlone127 Jun 21, 2023
c448cc6
Add model 2023-06-21-bert_embeddings_dbert_ko
ahmedlone127 Jun 21, 2023
5cc53b2
Add model 2023-06-21-bert_embeddings_javanese_bert_small_imdb_jv
ahmedlone127 Jun 21, 2023
90b986a
Add model 2023-06-21-bert_embeddings_dpr_spanish_passage_encoder_squa…
ahmedlone127 Jun 21, 2023
395fd62
Add model 2023-06-21-bert_embeddings_dpr_spanish_question_encoder_squ…
ahmedlone127 Jun 21, 2023
eb26168
Add model 2023-06-21-bert_embeddings_crosloengual_bert_en
ahmedlone127 Jun 21, 2023
4642311
Add model 2023-06-21-bert_embeddings_clinical_pubmed_bert_base_512_en
ahmedlone127 Jun 21, 2023
e8ac13e
Add model 2023-06-21-bert_embeddings_dpr_spanish_passage_encoder_allq…
ahmedlone127 Jun 21, 2023
a146658
Add model 2023-06-21-bert_embeddings_legal_bert_base_uncased_en
ahmedlone127 Jun 21, 2023
3fbe05d
Add model 2023-06-21-biobert_embeddings_all_pt
ahmedlone127 Jun 21, 2023
a45a28a
Add model 2023-06-21-bert_embeddings_wineberto_italian_cased_it
ahmedlone127 Jun 21, 2023
d46cf99
Add model 2023-06-21-bert_embeddings_clinical_pubmed_bert_base_128_en
ahmedlone127 Jun 21, 2023
648a8fb
Add model 2023-06-21-biobert_embeddings_clinical_pt
ahmedlone127 Jun 21, 2023
20df8e4
Add model 2023-06-21-bert_embeddings_telugu_bertu_te
ahmedlone127 Jun 21, 2023
23a74d2
Add model 2023-06-21-bert_embeddings_wobert_chinese_plus_zh
ahmedlone127 Jun 21, 2023
7e4706d
Add model 2023-06-21-bert_embeddings_wineberto_italian_cased_it
ahmedlone127 Jun 21, 2023
4784a7d
Add model 2023-06-21-bert_embeddings_sikuroberta_zh
ahmedlone127 Jun 21, 2023
279be87
Add model 2023-06-21-biobert_embeddings_biomedical_pt
ahmedlone127 Jun 21, 2023
ea76aeb
Add model 2023-06-21-bert_embeddings_sikubert_zh
ahmedlone127 Jun 21, 2023
03c18af
Add model 2023-06-21-bert_embeddings_psych_search_en
ahmedlone127 Jun 21, 2023
23ac4c0
Add model 2023-06-21-bert_embeddings_marathi_bert_mr
ahmedlone127 Jun 21, 2023
a84c7de
Add model 2023-06-21-bert_embeddings_netbert_en
ahmedlone127 Jun 21, 2023
9ff2565
Add model 2023-06-21-bert_embeddings_mbert_ar_c19_ar
ahmedlone127 Jun 21, 2023
56f121f
Add model 2023-06-21-bert_embeddings_multi_dialect_bert_base_arabic_ar
ahmedlone127 Jun 21, 2023
b8f3ba8
Add model 2023-06-21-bert_embeddings_lic_class_scancode_bert_base_cas…
ahmedlone127 Jun 21, 2023
dfa197c
Add model 2023-06-21-bert_embeddings_MARBERTv2_ar
ahmedlone127 Jun 21, 2023
526f65d
Add model 2023-06-21-bert_embeddings_bert_base_cased_pt_lenerbr_pt
ahmedlone127 Jun 21, 2023
294eb23
Add model 2023-06-21-bert_embeddings_bert_base_arabic_camelbert_msa_h…
ahmedlone127 Jun 21, 2023
fc1b3a2
Add model 2023-06-21-bert_embeddings_bert_base_german_cased_oldvocab_de
ahmedlone127 Jun 21, 2023
868256f
Add model 2023-06-21-bert_embeddings_bert_base_arabic_camelbert_msa_ar
ahmedlone127 Jun 21, 2023
774b7a1
Add model 2023-06-21-bert_embeddings_bert_base_arabic_camelbert_msa_e…
ahmedlone127 Jun 21, 2023
a224310
Add model 2023-06-21-bert_embeddings_bert_base_german_uncased_de
ahmedlone127 Jun 21, 2023
74b0c9b
Add model 2023-06-21-bert_embeddings_bert_base_arabic_camelbert_msa_q…
ahmedlone127 Jun 21, 2023
a25308c
Add model 2023-06-21-bert_embeddings_bert_base_historical_german_rw_c…
ahmedlone127 Jun 21, 2023
c19e5fe
Add model 2023-06-21-bert_embeddings_bert_base_italian_xxl_uncased_it
ahmedlone127 Jun 21, 2023
26869e6
Add model 2023-06-21-bert_embeddings_bert_base_arabertv2_ar
ahmedlone127 Jun 21, 2023
540fbb8
Add model 2023-06-21-bert_embeddings_bert_base_arabic_camelbert_msa_s…
ahmedlone127 Jun 21, 2023
746964d
Add model 2023-06-21-bert_embeddings_bert_base_arabic_camelbert_mix_ar
ahmedlone127 Jun 21, 2023
513512e
Add model 2023-06-21-bert_embeddings_bert_base_italian_xxl_cased_it
ahmedlone127 Jun 21, 2023
198cc3b
Add model 2023-06-21-bert_embeddings_bert_base_gl_cased_pt
ahmedlone127 Jun 21, 2023
010ed3f
Add model 2023-06-21-bert_embeddings_MARBERT_ar
ahmedlone127 Jun 21, 2023
f207f4c
Add model 2023-06-21-bert_embeddings_AraBertMo_base_V1_ar
ahmedlone127 Jun 21, 2023
53669a4
Add model 2023-06-21-bert_embeddings_bert_base_arabic_ar
ahmedlone127 Jun 21, 2023
c2bcf61
Add model 2023-06-21-bert_embeddings_DarijaBERT_ar
ahmedlone127 Jun 21, 2023
9cb8874
Add model 2023-06-21-bert_embeddings_Ara_DialectBERT_ar
ahmedlone127 Jun 21, 2023
4b2d52f
Add model 2023-06-21-bert_embeddings_German_MedBERT_de
ahmedlone127 Jun 21, 2023
9daf814
Add model 2023-06-21-bert_embeddings_bert_base_arabertv02_twitter_ar
ahmedlone127 Jun 21, 2023
e2e7218
Add model 2023-06-21-bert_embeddings_FinancialBERT_en
ahmedlone127 Jun 21, 2023
1f3ae39
Add model 2023-06-21-bert_embeddings_ARBERT_ar
ahmedlone127 Jun 21, 2023
07fcedd
Add model 2023-06-21-bert_embeddings_COVID_SciBERT_en
ahmedlone127 Jun 21, 2023
6fd0bbe
Add model 2023-06-21-bert_embeddings_alberti_bert_base_multilingual_c…
ahmedlone127 Jun 21, 2023
064ed90
Add model 2023-06-21-bert_embeddings_agriculture_bert_uncased_en
ahmedlone127 Jun 21, 2023
fd3c776
Add model 2023-06-21-bert_embeddings_bangla_bert_bn
ahmedlone127 Jun 21, 2023
f5bb895
Add model 2023-06-21-bert_embeddings_bert_kor_base_ko
ahmedlone127 Jun 21, 2023
750d737
Add model 2023-06-21-bert_embeddings_bert_base_arabertv02_ar
ahmedlone127 Jun 21, 2023
5ec17af
Add model 2023-06-21-bert_embeddings_arabert_c19_ar
ahmedlone127 Jun 21, 2023
eb41544
Add model 2023-06-21-bert_embeddings_bert_base_5lang_cased_es
ahmedlone127 Jun 21, 2023
18bcd06
Add model 2023-06-21-bert_embeddings_bert_base_arabertv01_ar
ahmedlone127 Jun 21, 2023
d3f289d
Add model 2023-06-21-bert_embeddings_bangla_bert_base_bn
ahmedlone127 Jun 21, 2023
bb90766
Add model 2023-06-21-bert_embeddings_bert_medium_arabic_ar
ahmedlone127 Jun 21, 2023
45a9ed2
Add model 2023-06-21-bert_embeddings_bert_political_election2020_twit…
ahmedlone127 Jun 21, 2023
08517c5
Add model 2023-06-21-bert_embeddings_bert_mini_arabic_ar
ahmedlone127 Jun 21, 2023
20eb1ed
Add model 2023-06-21-bert_embeddings_bert_base_arabert_ar
ahmedlone127 Jun 21, 2023
b7e0344
Add model 2023-06-21-bert_embeddings_beto_gn_base_cased_es
ahmedlone127 Jun 21, 2023
3662115
Add model 2023-06-21-bert_embeddings_chemical_bert_uncased_en
ahmedlone127 Jun 21, 2023
11fe048
Add model 2023-06-21-bert_embeddings_bert_base_ko
ahmedlone127 Jun 21, 2023
cabc59a
Add model 2023-06-21-bert_embeddings_chefberto_italian_cased_it
ahmedlone127 Jun 21, 2023
97dcd82
Add model 2023-06-21-bert_embeddings_childes_bert_en
ahmedlone127 Jun 21, 2023
ea9b48b
Add model 2023-06-21-bert_embeddings_bert_base_portuguese_cased_finet…
ahmedlone127 Jun 21, 2023
c4a442a
Add model 2023-06-21-bert_embeddings_bert_base_portuguese_cased_finet…
ahmedlone127 Jun 21, 2023
c4d1765
Add model 2023-06-21-bert_embeddings_bert_base_portuguese_cased_pt
ahmedlone127 Jun 21, 2023
b1dd662
Add model 2023-06-21-bert_embeddings_bert_base_qarib60_1790k_ar
ahmedlone127 Jun 21, 2023
12b020a
Add model 2023-06-21-bert_embeddings_bert_base_uncased_dstc9_en
ahmedlone127 Jun 21, 2023
c519aad
Add model 2023-06-21-bert_embeddings_bert_base_uncased_mnli_sparse_70…
ahmedlone127 Jun 21, 2023
37bac21
Add model 2023-06-21-bert_embeddings_bert_base_qarib_ar
ahmedlone127 Jun 21, 2023
07240b1
Add model 2023-06-21-bert_embeddings_bert_base_uncased_sparse_70_unst…
ahmedlone127 Jun 21, 2023
fcff5e7
Add model 2023-06-21-ms_bluebert_base_uncased_en
ahmedlone127 Jun 21, 2023
46b21ee
Add model 2023-06-21-bert_embeddings_bert_base_qarib60_860k_ar
ahmedlone127 Jun 21, 2023
7f09b85
fixing wrong spark version and removing tensorflow
ahmedlone127 Jun 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions docs/_posts/ahmedlone127/2023-06-21-bert_base_uncased_contracts_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
layout: model
title: English Legal Contracts BertEmbeddings model (Base, Uncased)
author: John Snow Labs
name: bert_base_uncased_contracts
date: 2023-06-21
tags: [open_source, bert, embeddings, finance, contracts, en, onnx]
task: Embeddings
language: en
edition: Spark NLP 5.0.0
spark_version: 3.0
supported: true
engine: onnx
annotator: BertEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained Word Embeddings model, trained on legal contracts, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-uncased-contracts` is a English model originally trained by `nlpaueb`.

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_contracts_en_5.0.0_3.0_1687337099443.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_contracts_en_5.0.0_3.0_1687337099443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = BertEmbeddings.pretrained("bert_base_uncased_contracts","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["I love Spark NLP."]]).toDF("text")

result = pipeline.fit(data).transform(data)
```
```scala
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")

val embeddings = BertEmbeddings.pretrained("bert_base_uncased_contracts","en")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("I love Spark NLP.").toDF("text")

val result = pipeline.fit(data).transform(data)
```


{:.nlu-block}
```python
import nlu
nlu.load("en.embed.bert.contracts.uncased_base").predict("""I love Spark NLP.""")
```

</div>

{:.model-param}

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = BertEmbeddings.pretrained("bert_base_uncased_contracts","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["I love Spark NLP."]]).toDF("text")

result = pipeline.fit(data).transform(data)
```
```scala
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")

val embeddings = BertEmbeddings.pretrained("bert_base_uncased_contracts","en")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("I love Spark NLP.").toDF("text")

val result = pipeline.fit(data).transform(data)
```

{:.nlu-block}
```python
import nlu
nlu.load("en.embed.bert.contracts.uncased_base").predict("""I love Spark NLP.""")
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|bert_base_uncased_contracts|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[sentence, token]|
|Output Labels:|[bert]|
|Language:|en|
|Size:|407.1 MB|
|Case sensitive:|true|
149 changes: 149 additions & 0 deletions docs/_posts/ahmedlone127/2023-06-21-bert_embeddings_ARBERT_ar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
layout: model
title: Arabic Bert Embeddings (ARBERT model)
author: John Snow Labs
name: bert_embeddings_ARBERT
date: 2023-06-21
tags: [bert, embeddings, ar, open_source, onnx]
task: Embeddings
language: ar
edition: Spark NLP 5.0.0
spark_version: 3.0
supported: true
engine: onnx
annotator: BertEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained Bert Embeddings model, uploaded to Hugging Face, adapted and imported into Spark NLP. `ARBERT` is a Arabic model orginally trained by `UBC-NLP`.

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_embeddings_ARBERT_ar_5.0.0_3.0_1687368387135.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_embeddings_ARBERT_ar_5.0.0_3.0_1687368387135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = BertEmbeddings.pretrained("bert_embeddings_ARBERT","ar") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["أنا أحب شرارة NLP"]]).toDF("text")

result = pipeline.fit(data).transform(data)
```
```scala
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")

val embeddings = BertEmbeddings.pretrained("bert_embeddings_ARBERT","ar")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("أنا أحب شرارة NLP").toDF("text")

val result = pipeline.fit(data).transform(data)
```


{:.nlu-block}
```python
import nlu
nlu.load("ar.embed.arbert").predict("""أنا أحب شرارة NLP""")
```

</div>

{:.model-param}

<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = BertEmbeddings.pretrained("bert_embeddings_ARBERT","ar") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])

data = spark.createDataFrame([["أنا أحب شرارة NLP"]]).toDF("text")

result = pipeline.fit(data).transform(data)
```
```scala
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")

val embeddings = BertEmbeddings.pretrained("bert_embeddings_ARBERT","ar")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))

val data = Seq("أنا أحب شرارة NLP").toDF("text")

val result = pipeline.fit(data).transform(data)
```

{:.nlu-block}
```python
import nlu
nlu.load("ar.embed.arbert").predict("""أنا أحب شرارة NLP""")
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|bert_embeddings_ARBERT|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[sentence, token]|
|Output Labels:|[bert]|
|Language:|ar|
|Size:|605.3 MB|
|Case sensitive:|true|
Loading