Skip to content

Commit

Permalink
2023-09-13-tlm_ag_small_scale_en (#13980)
Browse files Browse the repository at this point in the history
* Add model 2023-09-13-lb_mbert_en

* Add model 2023-09-13-bert_telugu_23_en

* Add model 2023-09-13-bert_web_bulgarian_bg

* Add model 2023-09-13-bert_base_italian_uncased_dbmdz_it

* Add model 2023-09-12-cord19_bert_en

* Add model 2023-09-13-bert_base_spanish_wwm_uncased_finetuned_imdb_spanish_en

* Add model 2023-09-13-bangla_bert_base_finetuned_tweets_en

* Add model 2023-09-13-bert_embding_finetuned_spmlm_en

* Add model 2023-09-12-bert_base_xsum_en

* Add model 2023-09-13-nepal_bhasa_model_en

* Add model 2023-09-13-multi_dialect_bert_base_arabic_ar

* Add model 2023-09-13-morrbert_en

* Add model 2023-09-13-bert_base_italian_xxl_cased_it

* Add model 2023-09-13-me_bert_mixed_v2_mr

* Add model 2023-09-13-datafinder_scibert_dutch_queries_en

* Add model 2023-09-13-bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_40_1_en

* Add model 2023-09-13-wisdomify_en

* Add model 2023-09-12-original_topic_sports_kcbert_en

* Add model 2023-09-13-rubert_tiny_ru

* Add model 2023-09-13-lexbert_turkish_uncased_en

* Add model 2023-09-13-bert_base_dutch_cased_mlm_visio_en

* Add model 2023-09-13-first_try_rubert_200_16_16_25ep_en

* Add model 2023-09-13-helloworld_en

* Add model 2023-09-13-parlbert_german_v2_de

* Add model 2023-09-13-bertjewdialdataqa20k_en

* Add model 2023-09-13-kinyabert_small_finetuned_kintweetsb_en

* Add model 2023-09-13-bert_medium_pretrained_on_squad_en

* Add model 2023-09-13-bert_review_en

* Add model 2023-09-13-distilbertu_base_cased_1.0_en

* Add model 2023-09-13-bert_double_en

* Add model 2023-09-13-bert_base_german_cased_domain_adaptation_accelerate_en

* Add model 2023-09-13-bert_base_uncased_copy_en

* Add model 2023-09-13-kinyabert_large_finetuned_kintweetsc_en

* Add model 2023-09-13-arabert_quran_large_en

* Add model 2023-09-13-batteryscibert_uncased_en

* Add model 2023-09-13-bert_mini_historic_multilingual_cased_xx

* Add model 2023-09-13-tlm_hyp_small_scale_en

* Add model 2023-09-13-bert_base_multilingual_cased_finetuned_am_shb_xx

* Add model 2023-09-13-bert_finetuning_test_zqf03118_en

* Add model 2023-09-13-bert_small_finetuned_legal_contracts_larger20_5_1_en

* Add model 2023-09-13-bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr50_8_fast_15_en

* Add model 2023-09-13-bert_medium_mlsm_en

* Add model 2023-09-13-hindi_random_twt_1m_hi

* Add model 2023-09-13-tfhbert_en

* Add model 2023-09-13-bert_base_uncased_byeongal_en

* Add model 2023-09-13-kinyabert_small_finetuned_kintweetsc_en

* Add model 2023-09-13-bert_base_greek_uncased_v1_finetuned_imdb_en

* Add model 2023-09-13-bert_large_cased_sigir_support_refute_evidence_norwegian_label_40_en

* Add model 2023-09-13-defsent_bert_base_uncased_mean_en

* Add model 2023-09-13-bert_large_cased_sigir_lr10_1_prepend_20_en

* Add model 2023-09-13-danish_legal_bert_base_da

* Add model 2023-09-13-bert_base_greek_uncased_v1_finetuned_polylex_en

* Add model 2023-09-12-mbert_tlm_sent_english_italian_en

* Add model 2023-09-13-sanaybert_model_v1_en

* Add model 2023-09-13-finbert_pretrain_tianzhou_en

* Add model 2023-09-13-bert_base_greek_uncased_v2_finetuned_polylex_en

* Add model 2023-09-13-first_try_rubert_200_16_16_10ep_en

* Add model 2023-09-13-batteryonlybert_cased_en

* Add model 2023-09-13-distilbertu_base_cased_anneal_en

* Add model 2023-09-13-bert_base_uncased_new_data_bert1_en

* Add model 2023-09-13-indo1_en

* Add model 2023-09-13-bertimbau_legal_pt

* Add model 2023-09-13-batteryonlybert_uncased_en

* Add model 2023-09-13-bert_large_cased_sigir_support_norwegian_label_40_sigir_tune2nd_lr10_labelled_40_en

* Add model 2023-09-13-bert_semeval_env_en

* Add model 2023-09-13-bert_base_german_cased_oldvocab_de

* Add model 2023-09-12-dummy_model_bigtimecodersean_en

* Add model 2023-09-13-berel_dicta_il_he

* Add model 2023-09-13-bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_22_en

* Add model 2023-09-13-bert_medium_historic_multilingual_cased_xx

* Add model 2023-09-12-bertjewdialdataallqonly04_en

* Add model 2023-09-13-quant_bert_en

* Add model 2023-09-13-netbert_en

* Add model 2023-09-13-tlm_chemprot_large_scale_en

* Add model 2023-09-13-covid_bert_base_en

* Add model 2023-09-13-cdgp_chilean_sign_language_scibert_dgen_en

* Add model 2023-09-13-mbertv2.0_en

* Add model 2023-09-13-kinyabert_large_finetuned_kintweetsd_en

* Add model 2023-09-13-bert_small_finetuned_eurlex_en

* Add model 2023-09-13-dal_bert_finetuned_medical_v3_en

* Add model 2023-09-13-cms_rx_language_correction_en

* Add model 2023-09-13-spa_en

* Add model 2023-09-13-bert_base_uncased_2022_nvidia_test_2_en

* Add model 2023-09-13-bert_base_uncased_2022_nvidia_test_3_en

* Add model 2023-09-13-further_train_2_domain_10_en

* Add model 2023-09-13-custom_legalbert_en

* Add model 2023-09-13-melayubert_ms

* Add model 2023-09-13-agriculture_bert_uncased_en

* Add model 2023-09-13-tlm_rct_20k_small_scale_en

* Add model 2023-09-13-wiki_bert_en

* Add model 2023-09-13-bert_large_cased_sigir_lr100_0_prepend_40_en

* Add model 2023-09-13-detox_kcbert_base_en

* Add model 2023-09-13-bert_base_multilingual_uncased_finetuned_xx

* Add model 2023-09-13-kinyabert_small_finetuned_kintweetsd_en

* Add model 2023-09-12-bert_base_english_turkish_cased_en

* Add model 2023-09-13-bert_large_cased_sigir_lr100_1_cased_40_en

* Add model 2023-09-12-tiny_mlm_glue_qqp_from_scratch_custom_tokenizer_expand_vocab_en

* Add model 2023-09-13-bibert_20_epochs_en

* Add model 2023-09-13-bert_finetuning_test_0925_en

* Add model 2023-09-13-tlm_chemprot_small_scale_en

* Add model 2023-09-13-carlbert_webex_en

* Add model 2023-09-13-bert_based_restaurant_review_en

* Add model 2023-09-13-bert_large_nli_en

* Add model 2023-09-13-dummy_model_jammm1412_en

* Add model 2023-09-13-bert_large_arabic_ar

* Add model 2023-09-13-finbert_lm_finetuned_news_en

* Add model 2023-09-12-topic_it_science_bert_en

* Add model 2023-09-13-defsent_bert_large_uncased_cls_en

* Add model 2023-09-13-hing_bert_hi

* Add model 2023-09-13-bert_base_uncased_finetuned_kintweetse_en

* Add model 2023-09-13-bert_base_historic_multilingual_cased_xx

* Add model 2023-09-13-marathi_bert_smaller_mr

* Add model 2023-09-13-tlm_citation_intent_medium_scale_en

* Add model 2023-09-13-bert_base_indonesian_1.5g_id

* Add model 2023-09-12-hatebert_en

* Add model 2023-09-13-tiny_biobert_en

* Add model 2023-09-13-gujibert_jian_fan_en

* Add model 2023-09-13-danish_bert_botxo_da

* Add model 2023-09-13-bert_small_finetuned_parsed_longer100_en

* Add model 2023-09-12-bert_base_cased_portuguese_ccorpus_en

* Add model 2023-09-13-biblitbert_en

* Add model 2023-09-13-bert_medium_arabic_ar

* Add model 2023-09-13-bertjewdialdataallqonly09_en

* Add model 2023-09-13-legal_bertimbau_base_pt

* Add model 2023-09-13-bert_base_uncased_issues_128_roscoyoon_en

* Add model 2023-09-13-indojave_codemixed_bert_base_id

* Add model 2023-09-13-eng_en

* Add model 2023-09-13-arabertmo_base_v10_en

* Add model 2023-09-12-legal_bert_base_uncased_en

* Add model 2023-09-13-dlub_2022_mlm_full_muug_en

* Add model 2023-09-12-bertjewdialdataall_en

* Add model 2023-09-12-mlm_20230503_indobert_base_p2_002_en

* Add model 2023-09-12-sec_bert_base_en

* Add model 2023-09-13-german_financial_statements_bert_de

* Add model 2023-09-13-carlbert_webex_mlm_spatial_en

* Add model 2023-09-13-bert_base_indonesian_522m_id

* Add model 2023-09-13-dbert_ko

* Add model 2023-09-13-hing_mbert_hi

* Add model 2023-09-13-greek_media_bert_base_uncased_el

* Add model 2023-09-13-mlm_20230403_002_1_en

* Add model 2023-09-13-tsonga_test2_en

* Add model 2023-09-13-bert_base_italian_xxl_uncased_it

* Add model 2023-09-13-distilbertu_base_cased_0.5_en

* Add model 2023-09-13-spanish_bert_base_spanish_wwm_cased_en

* Add model 2023-09-13-bert_dp_4_en

* Add model 2023-09-12-mlm_20230513_indobert_large_p1_002_pt1_en

* Add model 2023-09-13-bert_small_finetuned_legal_definitions_en

* Add model 2023-09-12-tiny_mlm_glue_mrpc_custom_tokenizer_expand_vocab_en

* Add model 2023-09-13-sci_summary_2_en

* Add model 2023-09-13-nepnewsbert_en

* Add model 2023-09-13-bert_small_finetuned_legal_definitions_longer_en

* Add model 2023-09-13-bert_base_greek_uncased_v3_finetuned_polylex_en

* Add model 2023-09-13-beto_chile_politico_1990_2019_es

* Add model 2023-09-13-retromae_msmarco_finetune_en

* Add model 2023-09-13-bert_base_arabertv2_ar

* Add model 2023-09-13-tlm_sciie_medium_scale_en

* Add model 2023-09-13-further_train_domain_10_en

* Add model 2023-09-13-mbert_rom_arabic_en

* Add model 2023-09-12-biomednlp_pubmedbert_base_uncased_abstract_fulltext_en

* Add model 2023-09-13-distil_clinicalbert_en

* Add model 2023-09-13-kinyabert_small_finetuned_kintweetsa_en

* Add model 2023-09-13-lernnavibert_en

* Add model 2023-09-13-absa_mlm_2_en

* Add model 2023-09-13-tlm_sciie_small_scale_en

* Add model 2023-09-13-javanese_bert_small_jv

* Add model 2023-09-13-incaselawbert_en

* Add model 2023-09-13-mymodel001_en

* Add model 2023-09-12-akeylegalbert6_en

* Add model 2023-09-13-bert_base_low_resource_wellness_en

* Add model 2023-09-12-bert_base_english_russian_cased_en

* Add model 2023-09-13-carlbert_webex_mlm_aditeyabaral_en

* Add model 2023-09-13-distilbertu_base_cased_en

* Add model 2023-09-13-mlm_20230403_001_1_en

* Add model 2023-09-13-test_itamarl_en

* Add model 2023-09-13-cms_ext_bio_clinicalbert_en

* Add model 2023-09-13-knowbias_bert_base_uncased_race_en

* Add model 2023-09-13-film98991bert_base_uncased_en

* Add model 2023-09-12-romanian_bert_tweet_large_ro

* Add model 2023-09-13-bioformer_8l_en

* Add model 2023-09-12-efficient_splade_vi_bt_large_query_en

* Add model 2023-09-13-bert_small_historic_multilingual_cased_xx

* Add model 2023-09-13-bert_dk_rest_en

* Add model 2023-09-13-all_minilm_l6_v2_finetuned_wikitext2_en

* Add model 2023-09-13-gujiroberta_jian_fan_en

* Add model 2023-09-13-bert_hs_idpt_en

* Add model 2023-09-13-itcast_nlp_base_en

* Add model 2023-09-13-bert_base_swedish_europeana_cased_en

* Add model 2023-09-13-nlpload_en

* Add model 2023-09-13-bert_base_uncased_issues_128_shenghao1993_en

* Add model 2023-09-12-minilmv2_l6_h768_distilled_from_bert_large_en

* Add model 2023-09-13-newmodel_en

* Add model 2023-09-13-cocodr_base_en

* Add model 2023-09-13-marathi_bert_small_mr

* Add model 2023-09-13-slimr_pp_msmarco_passage_en

* Add model 2023-09-13-bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr50_8_fast_8_en

* Add model 2023-09-13-bert_base_uncased_issues_128_synpjh_en

* Add model 2023-09-13-twteval_pretrained_en

* Add model 2023-09-13-mlm_20230404_001_2_en

* Add model 2023-09-13-sanay_bert_en

* Add model 2023-09-13-kinyabert_large_finetuned_kintweets_en

* Add model 2023-09-13-bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr50_8_fast_11_en

* Add model 2023-09-12-muril_adapted_local_xx

* Add model 2023-09-13-bert_base_uncased_ifedrigo_en

* Add model 2023-09-13-bert_base_uncased_issues_128_haesun_en

* Add model 2023-09-13-bert_base_spanish_wwm_cased_dccuchile_es

* Add model 2023-09-13-kinyabert_small_finetuned_kintweets_en

* Add model 2023-09-13-retromae_en

* Add model 2023-09-13-indo2_en

* Add model 2023-09-13-bert_base_turkish_128k_cased_offensive_mlm_tr

* Add model 2023-09-13-tod_bert_jnt_v1_en

* Add model 2023-09-12-prop_marco_en

* Add model 2023-09-13-bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_40_en

* Add model 2023-09-13-biovocabbert_en

* Add model 2023-09-13-tlm_citation_intent_small_scale_en

* Add model 2023-09-13-bert_base_uncased_finetuned_lexglue_en

* Add model 2023-09-13-cocodr_large_en

* Add model 2023-09-13-bert_base_german_cased_archaeo_de

* Add model 2023-09-13-bert_large_cased_finetuned_low100_0_cased_da_20_en

* Add model 2023-09-13-bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_2_en

* Add model 2023-09-13-tulio_chilean_spanish_bert_es

* Add model 2023-09-12-bertjewdialdataall04_en

* Add model 2023-09-13-protaugment_lm_hwu64_en

* Add model 2023-09-13-tlm_rct_20k_medium_scale_en

* Add model 2023-09-13-drug_combinations_lm_pubmedbert_en

* Add model 2023-09-13-btfhbert_en

* Add model 2023-09-13-mrf_bert_en

* Add model 2023-09-13-bert_base_uncased_issues_128_xxr_en

* Add model 2023-09-13-bert_base_turkish_uncased_offensive_mlm_tr

* Add model 2023-09-12-mbert_xdm_english_chinese_en

* Add model 2023-09-12-legal_bertimbau_large_pt

* Add model 2023-09-13-bert_base_spanish_wwm_cased_plai_edp_test_es

* Add model 2023-09-13-bert_base_multilingual_cased_urgency_xx

* Add model 2023-09-13-bert_large_retrained_4_epochs_en

* Add model 2023-09-13-bert_base_historic_dutch_cased_en

* Add model 2023-09-12-bert_c1_english_only_en

* Add model 2023-09-13-bert_base_portuguese_cased_finetuned_tcu_acordaos_pt

* Add model 2023-09-13-bert_base_spanish_wwm_uncased_es

* Add model 2023-09-13-hindi_tweets_bert_v2_hi

* Add model 2023-09-12-mlm_20230513_indobert_large_p1_002_pt2_en

* Add model 2023-09-13-inlegalbert_cbp_lkg_finetuned_en

* Add model 2023-09-13-bert_base_cased_portuguese_lenerbr_alynneoya_en

* Add model 2023-09-13-sci_summary_3_en

* Add model 2023-09-13-protaugment_lm_clinic150_en

* Add model 2023-09-13-aave_bert_en

* Add model 2023-09-13-tlm_amazon_small_scale_en

* Add model 2023-09-13-ai12_en

* Add model 2023-09-13-kinyabert_large_finetuned_kintweetsa_en

* Add model 2023-09-13-bert_base_turkish_128k_uncased_offensive_mlm_tr

* Add model 2023-09-13-small_mlm_glue_qqp_custom_tokenizer_expand_vocab_en

* Add model 2023-09-13-nl2_en

---------

Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com>
  • Loading branch information
jsl-models and ahmedlone127 authored Sep 13, 2023
1 parent f3c878e commit 6ec2297
Show file tree
Hide file tree
Showing 391 changed files with 36,363 additions and 0 deletions.
93 changes: 93 additions & 0 deletions docs/_posts/ahmedlone127/2023-09-12-akeylegalbert6_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
layout: model
title: English akeylegalbert6 BertEmbeddings from hatemestinbejaia
author: John Snow Labs
name: akeylegalbert6
date: 2023-09-12
tags: [bert, en, open_source, fill_mask, onnx]
task: Embeddings
language: en
edition: Spark NLP 5.1.1
spark_version: 3.0
supported: true
engine: onnx
annotator: BertEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`akeylegalbert6` is a English model originally trained by hatemestinbejaia.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/akeylegalbert6_en_5.1.1_3.0_1694557526003.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/akeylegalbert6_en_5.1.1_3.0_1694557526003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python


document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")


embeddings =BertEmbeddings.pretrained("akeylegalbert6","en") \
.setInputCols(["documents","token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline().setStages([document_assembler, embeddings])

pipelineModel = pipeline.fit(data)

pipelineDF = pipelineModel.transform(data)

```
```scala


val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("embeddings")

val embeddings = BertEmbeddings
.pretrained("akeylegalbert6", "en")
.setInputCols(Array("documents","token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings))

val pipelineModel = pipeline.fit(data)

val pipelineDF = pipelineModel.transform(data)


```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|akeylegalbert6|
|Compatibility:|Spark NLP 5.1.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents, token]|
|Output Labels:|[embeddings]|
|Language:|en|
|Size:|407.1 MB|

## References

https://huggingface.co/hatemestinbejaia/AkeyLegalBert6
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
layout: model
title: English bert_base_cased_portuguese_ccorpus BertEmbeddings from rosimeirecosta
author: John Snow Labs
name: bert_base_cased_portuguese_ccorpus
date: 2023-09-12
tags: [bert, en, open_source, fill_mask, onnx]
task: Embeddings
language: en
edition: Spark NLP 5.1.1
spark_version: 3.0
supported: true
engine: onnx
annotator: BertEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_portuguese_ccorpus` is a English model originally trained by rosimeirecosta.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_portuguese_ccorpus_en_5.1.1_3.0_1694553390707.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_portuguese_ccorpus_en_5.1.1_3.0_1694553390707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python


document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")


embeddings =BertEmbeddings.pretrained("bert_base_cased_portuguese_ccorpus","en") \
.setInputCols(["documents","token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline().setStages([document_assembler, embeddings])

pipelineModel = pipeline.fit(data)

pipelineDF = pipelineModel.transform(data)

```
```scala


val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("embeddings")

val embeddings = BertEmbeddings
.pretrained("bert_base_cased_portuguese_ccorpus", "en")
.setInputCols(Array("documents","token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings))

val pipelineModel = pipeline.fit(data)

val pipelineDF = pipelineModel.transform(data)


```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|bert_base_cased_portuguese_ccorpus|
|Compatibility:|Spark NLP 5.1.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents, token]|
|Output Labels:|[embeddings]|
|Language:|en|
|Size:|405.9 MB|

## References

https://huggingface.co/rosimeirecosta/bert-base-cased-pt-ccorpus
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
layout: model
title: English bert_base_english_dutch_cased BertEmbeddings from Geotrend
author: John Snow Labs
name: bert_base_english_dutch_cased
date: 2023-09-12
tags: [bert, en, open_source, fill_mask, onnx]
task: Embeddings
language: en
edition: Spark NLP 5.1.1
spark_version: 3.0
supported: true
engine: onnx
annotator: BertEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_english_dutch_cased` is a English model originally trained by Geotrend.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_english_dutch_cased_en_5.1.1_3.0_1694554396036.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_english_dutch_cased_en_5.1.1_3.0_1694554396036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python


document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")


embeddings =BertEmbeddings.pretrained("bert_base_english_dutch_cased","en") \
.setInputCols(["documents","token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline().setStages([document_assembler, embeddings])

pipelineModel = pipeline.fit(data)

pipelineDF = pipelineModel.transform(data)

```
```scala


val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("embeddings")

val embeddings = BertEmbeddings
.pretrained("bert_base_english_dutch_cased", "en")
.setInputCols(Array("documents","token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings))

val pipelineModel = pipeline.fit(data)

val pipelineDF = pipelineModel.transform(data)


```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|bert_base_english_dutch_cased|
|Compatibility:|Spark NLP 5.1.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents, token]|
|Output Labels:|[embeddings]|
|Language:|en|
|Size:|415.9 MB|

## References

https://huggingface.co/Geotrend/bert-base-en-nl-cased
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
layout: model
title: English bert_base_english_russian_cased BertEmbeddings from Geotrend
author: John Snow Labs
name: bert_base_english_russian_cased
date: 2023-09-12
tags: [bert, en, open_source, fill_mask, onnx]
task: Embeddings
language: en
edition: Spark NLP 5.1.1
spark_version: 3.0
supported: true
engine: onnx
annotator: BertEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_english_russian_cased` is a English model originally trained by Geotrend.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_english_russian_cased_en_5.1.1_3.0_1694555197411.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_english_russian_cased_en_5.1.1_3.0_1694555197411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python


document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")


embeddings =BertEmbeddings.pretrained("bert_base_english_russian_cased","en") \
.setInputCols(["documents","token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline().setStages([document_assembler, embeddings])

pipelineModel = pipeline.fit(data)

pipelineDF = pipelineModel.transform(data)

```
```scala


val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("embeddings")

val embeddings = BertEmbeddings
.pretrained("bert_base_english_russian_cased", "en")
.setInputCols(Array("documents","token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings))

val pipelineModel = pipeline.fit(data)

val pipelineDF = pipelineModel.transform(data)


```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|bert_base_english_russian_cased|
|Compatibility:|Spark NLP 5.1.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents, token]|
|Output Labels:|[embeddings]|
|Language:|en|
|Size:|428.3 MB|

## References

https://huggingface.co/Geotrend/bert-base-en-ru-cased
Loading

0 comments on commit 6ec2297

Please sign in to comment.