Skip to content

Commit

Permalink
2024-11-10-rubert_address_elements_ru (#14452)
Browse files Browse the repository at this point in the history
* Add model 2024-11-11-sent_bowdpr_wiki_en

* Add model 2024-11-11-cc_uffs_ppc_ft_test_multiqa_pipeline_en

* Add model 2024-11-11-unified_skill_ner_echo_en

* Add model 2024-11-11-mountain_ner_model_en

* Add model 2024-11-11-mountain_ner_model_pipeline_en

* Add model 2024-11-11-msu_wiki_ner_ru

* Add model 2024-11-11-bert_xomlac_ner_pipeline_zh

* Add model 2024-11-11-bert_base_cased_finetuned_ner_pipeline_en

* Add model 2024-11-11-bert_base_cased_finetuned_ner_en

* Add model 2024-11-11-ner_tokenclassification_persian_pipeline_en

* Add model 2024-11-11-persian_text_ner_bert_v1_fa

* Add model 2024-11-11-sent_flang_spanbert_pipeline_en

* Add model 2024-11-11-sent_gww_pipeline_en

* Add model 2024-11-11-software_ner_prod_en

* Add model 2024-11-11-quote_model_bertm_v1_pipeline_en

* Add model 2024-11-11-classify_bluesky_1000_v2_pipeline_en

* Add model 2024-11-11-msu_wiki_ner_pipeline_ru

* Add model 2024-11-11-hardware_ner_prod_en

* Add model 2024-11-11-auto_adver_pipeline_en

* Add model 2024-11-11-bert_finetuned_ner_viktoryes_pipeline_en

* Add model 2024-11-11-bert_finetuned_ner_viktoryes_en

* Add model 2024-11-11-quote_model_bertm_v1_en

* Add model 2024-11-11-software_ner_prod_pipeline_en

* Add model 2024-11-11-sent_tiny_mlm_glue_qnli_en

* Add model 2024-11-11-sent_cocodr_large_pipeline_en

* Add model 2024-11-11-ner_tokenclassification_persian_en

* Add model 2024-11-11-hardware_ner_prod_pipeline_en

* Add model 2024-11-11-embedded_e5_base_50_pipeline_en

* Add model 2024-11-11-bert_finetuned_tmvar_corpus_pipeline_en

* Add model 2024-11-11-e5_base_pipeline_en

* Add model 2024-11-11-e5_large_en

* Add model 2024-11-11-rupunct_small_ru

* Add model 2024-11-11-spanish_medical_ner_pipeline_es

* Add model 2024-11-11-nepal_bhasa_biored_model_pipeline_en

* Add model 2024-11-11-unified_skill_ner_echo_pipeline_en

* Add model 2024-11-11-e5_large_pipeline_en

* Add model 2024-11-11-e5_small_en

* Add model 2024-11-11-cleaned_e5_base_unsupervised_pipeline_en

* Add model 2024-11-11-keybert_bulgarian_pipeline_bg

* Add model 2024-11-11-bert_xomlac_ner_zh

* Add model 2024-11-11-bert_finetuned_tmvar_corpus_en

* Add model 2024-11-11-cleaned_e5_large_unsupervised_en

* Add model 2024-11-11-sent_tiny_mlm_snli_en

* Add model 2024-11-11-embedded_e5_base_50_en

* Add model 2024-11-11-cleaned_e5_base_unsupervised_en

* Add model 2024-11-11-results_pipeline_en

* Add model 2024-11-11-xlm_cebinary_vmo2_large_3_en

* Add model 2024-11-11-xlm_cebinary_vmo2_large_3_pipeline_en

* Add model 2024-11-11-southern_sotho_mpnet_base_normal_en

* Add model 2024-11-11-persian_text_ner_bert_v1_pipeline_fa

* Add model 2024-11-11-results_en

* Add model 2024-11-11-autotrain_nzog3_ca819_pipeline_en

* Add model 2024-11-11-sentence_similarity_finetuned_mpnet_adrta_pipeline_en

* Add model 2024-11-11-sentence_similarity_finetuned_mpnet_adrta_en

* Add model 2024-11-11-sentencetransformer_mpnet_base_on_chemical_dataset_en

* Add model 2024-11-11-keybert_bulgarian_bg

* Add model 2024-11-11-southern_sotho_mpnet_base10_en

* Add model 2024-11-11-sentencetransformer_mpnet_base_on_chemical_dataset_pipeline_en

* Add model 2024-11-11-e5_base_en

* Add model 2024-11-11-southern_sotho_mpnet_base_normal_pipeline_en

* Add model 2024-11-11-finetuned_sentence_similarity_en

* Add model 2024-11-11-nepal_bhasa_biored_model_en

* Add model 2024-11-11-whisper_tiny_amharic_en

* Add model 2024-11-11-cleaned_e5_large_unsupervised_pipeline_en

* Add model 2024-11-11-sent_bert_base_english_french_arabic_cased_pipeline_en

* Add model 2024-11-11-fund_embedder_en

* Add model 2024-11-11-whisper_tiny_v2_2_romanian_pipeline_en

* Add model 2024-11-11-southern_sotho_mpnet_base20_pipeline_en

* Add model 2024-11-11-auto_adver_en

* Add model 2024-11-11-whisper_small_arabic_augmentation_en

* Add model 2024-11-11-linshoufanfork_whisper_small_nan_twi_pinyin_pipeline_en

* Add model 2024-11-11-whisper_small_arabic_augmentation_pipeline_en

* Add model 2024-11-11-whisper_tiny_amharic_pipeline_en

* Add model 2024-11-11-whisper_tiny_arabic_pipeline_ar

* Add model 2024-11-11-linshoufanfork_whisper_small_nan_twi_pinyin_en

* Add model 2024-11-11-checkpoints_almino_pipeline_en

* Add model 2024-11-11-whisper_tiny_v2_2_romanian_en

* Add model 2024-11-11-autotrain_nzog3_ca819_en

* Add model 2024-11-11-whisper_omg_hi

* Add model 2024-11-11-whisper_omg_pipeline_hi

* Add model 2024-11-11-checkpoints_almino_en

* Add model 2024-11-11-whisper_small_western_frisian_dutch_transfer_from_english_fy

* Add model 2024-11-11-whisper_tiny_nob_en

* Add model 2024-11-11-whisper_tiny_nob_pipeline_en

* Add model 2024-11-11-whisper_small_western_frisian_dutch_transfer_from_english_pipeline_fy

* Add model 2024-11-11-whisper_tiny_arabic_ar

* Add model 2024-11-11-e5_small_pipeline_en

* Add model 2024-11-11-whisper_small_english_crossdelenna_en

* Add model 2024-11-11-finetuned_sentence_similarity_pipeline_en

* Add model 2024-11-11-whisper_small_malay_pipeline_my

* Add model 2024-11-11-whisper_small_malay_my

* Add model 2024-11-11-rupunct_small_pipeline_ru

* Add model 2024-11-11-southern_sotho_mpnet_base20_en

* Add model 2024-11-11-whisper_small_english_crossdelenna_pipeline_en

* Add model 2024-11-11-whisper_small_russian_f_ru

* Add model 2024-11-11-whisper_small_yt_en

* Add model 2024-11-11-whisper_small_russian_f_pipeline_ru

* Add model 2024-11-11-whisper_small_yt_pipeline_en

* Add model 2024-11-11-whisper_base_common_voice_arabic11_0_en

* Add model 2024-11-11-southern_sotho_mpnet_base10_pipeline_en

* Add model 2024-11-11-spanish_medical_ner_es

* Add model 2024-11-11-whisper_base_common_voice_arabic11_0_pipeline_en

* Add model 2024-11-11-whisper_base_hungarian_v1_hu

* Add model 2024-11-11-whisper_base_hungarian_v1_pipeline_hu

* Add model 2024-11-11-whisper_finetuned_atcosim_en

* Add model 2024-11-11-whisper_finetuned_atcosim_pipeline_en

* Add model 2024-11-11-whisper_medium_latvian_ver2_lv

* Add model 2024-11-11-whisper_medium_latvian_ver2_pipeline_lv

* Add model 2024-11-11-whisper_small_french_uncased_fr

* Add model 2024-11-11-whisper_small_french_uncased_pipeline_fr

* Add model 2024-11-11-whisper_tiny_chinese_antares28_en

* Add model 2024-11-11-whisper_tiny_chinese_antares28_pipeline_en

* Add model 2024-11-11-malaysian_whisper_tiny_ms

* Add model 2024-11-11-malaysian_whisper_tiny_pipeline_ms

* Add model 2024-11-11-whisper_medium_luluw_en

* Add model 2024-11-11-whisper_small_dutch_en

* Add model 2024-11-11-whisper_small_greek_modern_finetune_el

* Add model 2024-11-11-whisper_small_dutch_pipeline_en

* Add model 2024-11-11-whisper_small_greek_modern_finetune_pipeline_el

* Add model 2024-11-11-deberta_v3_large_lemon_spell_5k_en

* Add model 2024-11-11-deberta_v3_large_lemon_spell_5k_pipeline_en

* Add model 2024-11-11-bert_finetuned_squad_dokyoungkim_en

* Add model 2024-11-11-bert_finetuned_squad_dokyoungkim_pipeline_en

* Add model 2024-11-11-bert_large_uncased_whole_word_masking_finetuned_squad_dev_i_en

* Add model 2024-11-11-bert_large_uncased_whole_word_masking_finetuned_squad_dev_i_pipeline_en

* Add model 2024-11-11-banglabert_qa_en

* Add model 2024-11-11-mi_chatbotv3_en

* Add model 2024-11-11-mi_chatbotv3_pipeline_en

* Add model 2024-11-11-bert_sliding_window_epoch_3_en

* Add model 2024-11-11-hebert_finetuned_precedents_he

* Add model 2024-11-11-bert_sliding_window_epoch_3_pipeline_en

* Add model 2024-11-11-bert_base_uncased_finetuned_triviaqa_en

* Add model 2024-11-11-mbert_finetuned_mlqa_dev_spanish_chinese_hindi_en

* Add model 2024-11-11-bert_base_uncased_figurative_language_en

* Add model 2024-11-11-bert_base_uncased_finetuned_triviaqa_pipeline_en

* Add model 2024-11-11-bert_finetuned_squad_accelerate_3_en

* Add model 2024-11-11-banglabert_qa_pipeline_en

* Add model 2024-11-11-bert_base_uncased_figurative_language_pipeline_en

* Add model 2024-11-11-mbert_finetuned_mlqa_dev_spanish_chinese_hindi_pipeline_en

* Add model 2024-11-11-hebert_finetuned_precedents_pipeline_he

* Add model 2024-11-11-bert_finetuned_squad_accelerate_3_pipeline_en

* Add model 2024-11-11-beto_sentiment_analysis_finetuned_en

* Add model 2024-11-11-beto_sentiment_analysis_finetuned_pipeline_en

* Add model 2024-11-11-personalinfoclassifier_en

* Add model 2024-11-11-fine_tuned_metaphor_detection_en

* Add model 2024-11-11-personalinfoclassifier_pipeline_en

* Add model 2024-11-11-hs_arabic_translate_syn_4class_for_tool_en

* Add model 2024-11-11-fine_tuned_metaphor_detection_pipeline_en

* Add model 2024-11-11-clinical_trial_termination_en

* Add model 2024-11-11-factuality_model_pipeline_en

* Add model 2024-11-11-factuality_model_en

* Add model 2024-11-11-bert_classifier_spanish_news_classification_headlines_pipeline_es

* Add model 2024-11-11-kaggle_detect_generated_text_pipeline_en

* Add model 2024-11-11-bert_base_uncased_sba_clf_pipeline_en

* Add model 2024-11-11-e5_small_lora_ai_generated_detector_en

* Add model 2024-11-11-bert_340m_ft_first_1000_pref_en

* Add model 2024-11-11-kaggle_detect_generated_text_en

* Add model 2024-11-11-bert_news_class_en

* Add model 2024-11-11-politeness_model_pipeline_en

* Add model 2024-11-11-politeness_model_en

* Add model 2024-11-11-biomednlp_pubmedbert_base_uncased_abstract_fulltext_finetuned_pubmedqa_pipeline_en

* Add model 2024-11-11-scenario_nepal_bhasa_pipeline_en

* Add model 2024-11-11-bio_clinicalbert_medical_en

* Add model 2024-11-11-bert_classifier_spanish_news_classification_headlines_es

* Add model 2024-11-11-bert_base_cased_mnli_en

* Add model 2024-11-11-bert_large_finetuned_phishing_junginkim_en

* Add model 2024-11-11-popbert_pipeline_de

* Add model 2024-11-11-aspect_based_sentiment_analyzer_using_bert_en

* Add model 2024-11-11-bert_base_cased_mnli_pipeline_en

* Add model 2024-11-11-workprocess_24_10_01_en

* Add model 2024-11-11-bert_model_news_aggregator_pipeline_en

* Add model 2024-11-11-bert_base_uncased_emotion_prikshit7766_en

* Add model 2024-11-11-clinical_trial_termination_pipeline_en

* Add model 2024-11-11-nasa_smd_ibm_v0_1_uat_labeler_en

* Add model 2024-11-11-hs_arabic_translate_syn_4class_for_tool_pipeline_en

* Add model 2024-11-11-flash_italian_ns_classifier_fpt_en

* Add model 2024-11-11-bert_large_finetuned_phishing_junginkim_pipeline_en

* Add model 2024-11-11-e5_small_lora_ai_generated_detector_pipeline_en

* Add model 2024-11-11-biomednlp_pubmedbert_base_uncased_abstract_fulltext_finetuned_pubmedqa_en

* Add model 2024-11-11-climateattention_ctw_pipeline_en

* Add model 2024-11-11-climateattention_ctw_en

* Add model 2024-11-11-bio_clinicalbert_medical_pipeline_en

* Add model 2024-11-11-bert_340m_ft_first_1000_pref_pipeline_en

* Add model 2024-11-11-sst2_benign_bert_uncased_pipeline_en

* Add model 2024-11-11-roberta_base_finetuned_ner_cadec_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_v1_1_epoch_7_en

* Add model 2024-11-11-roberta_base_ainu_sayula_popoluca_en

* Add model 2024-11-11-roberta_large_lemon_spell_5k_pipeline_en

* Add model 2024-11-11-roberta_test_training_pipeline_en

* Add model 2024-11-11-roberta_test_training_en

* Add model 2024-11-11-securebert_finetuned_ner_pipeline_en

* Add model 2024-11-11-bert_base_uncased_sba_clf_en

* Add model 2024-11-11-sst2_benign_bert_uncased_en

* Add model 2024-11-11-biomed_roberta_all_deep_en

* Add model 2024-11-11-bert_model_news_aggregator_en

* Add model 2024-11-11-indonesian_roberta_base_nerp_tagger_pipeline_en

* Add model 2024-11-11-indonesian_roberta_base_nerp_tagger_en

* Add model 2024-11-11-flash_italian_ns_classifier_fpt_pipeline_en

* Add model 2024-11-11-popbert_de

* Add model 2024-11-11-roberta_base_ainu_sayula_popoluca_pipeline_en

* Add model 2024-11-11-roberta_base_finetuned_ner_cadec_en

* Add model 2024-11-11-nasa_smd_ibm_v0_1_uat_labeler_pipeline_en

* Add model 2024-11-11-scenario_nepal_bhasa_en

* Add model 2024-11-11-affilgood_ner_en

* Add model 2024-11-11-bge_large_zhtw_v1_5_en

* Add model 2024-11-11-bge_small_english_v1_5_ft_orc_0930_dates_en

* Add model 2024-11-11-bge_base_legal_matryoshka_v1_pipeline_en

* Add model 2024-11-11-bsc_bio_ehr_spanish_distemist_es

* Add model 2024-11-11-finetuned_baai_bge_base_english_pipeline_en

* Add model 2024-11-11-bge_micro_smiles_pipeline_en

* Add model 2024-11-11-bge_micro_smiles_en

* Add model 2024-11-11-securebert_finetuned_ner_en

* Add model 2024-11-11-bsc_bio_ehr_spanish_distemist_pipeline_es

* Add model 2024-11-11-bge_tuned_en

* Add model 2024-11-11-bge_base_english_v1_5_course_recommender_v2_en

* Add model 2024-11-11-bge_base_legal_matryoshka_v1_en

* Add model 2024-11-11-roberta_combined_generated_v1_1_epoch_8_en

* Add model 2024-11-11-bge_small_english_v1_5_ft_orc_0930_dates_pipeline_en

* Add model 2024-11-11-roberta_base_bne_capitel_ner_bsc_lt_pipeline_es

* Add model 2024-11-11-fine_tuned_bge_large_en

* Add model 2024-11-11-bge_99gpt_v1_en

* Add model 2024-11-11-affilgood_ner_pipeline_en

* Add model 2024-11-11-roberta_large_finetuned_abbr_filtered_plod_en

* Add model 2024-11-11-roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es

* Add model 2024-11-11-bge_tuned_pipeline_en

* Add model 2024-11-11-roberta_base_absa_ate_sentiment_en

* Add model 2024-11-11-bsc_bio_ehr_spanish_medprocner_pipeline_es

* Add model 2024-11-11-lettuce_sayula_popoluca_dutch_mono_en

* Add model 2024-11-11-ruroberta_large_ner_pipeline_en

* Add model 2024-11-11-bge_base_english_v1_5_course_recommender_v2_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_epoch_7_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_epoch_7_en

* Add model 2024-11-11-bge_small_english_v1_5_rirag_obliqa_en

* Add model 2024-11-11-bge_99gpt_v1_pipeline_en

* Add model 2024-11-11-bert_base_uncased_emotion_prikshit7766_pipeline_en

* Add model 2024-11-11-roberta_large_finetuned_ner_finetuned_ner_en

* Add model 2024-11-11-lettuce_sayula_popoluca_dutch_mono_pipeline_en

* Add model 2024-11-11-roberta_large_finetuned_ner_finetuned_ner_pipeline_en

* Add model 2024-11-11-bge_base_english_v1_5_finetuned_osllmai_v1_pipeline_en

* Add model 2024-11-11-bert_finetuned_semantic_augmentation_ner_en

* Add model 2024-11-11-bge_large_zhtw_v1_5_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_v1_1_epoch_8_pipeline_en

* Add model 2024-11-11-ruroberta_large_ner_en

* Add model 2024-11-11-roberta_spanish_clinical_trials_neg_spec_ner_en

* Add model 2024-11-11-bert_news_class_pipeline_en

* Add model 2024-11-11-roberta_base_absa_ate_sentiment_pipeline_en

* Add model 2024-11-11-finetuned_bge_base_english_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_v1_1_epoch_7_pipeline_en

* Add model 2024-11-11-fine_tuned_bge_large_pipeline_en

* Add model 2024-11-11-workprocess_24_10_01_pipeline_en

---------

Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com>
  • Loading branch information
jsl-models and ahmedlone127 authored Nov 11, 2024
1 parent 5a556ba commit ab69789
Show file tree
Hide file tree
Showing 552 changed files with 44,394 additions and 0 deletions.
94 changes: 94 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-10-afriberta_v2_large_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
layout: model
title: English afriberta_v2_large XlmRoBertaEmbeddings from castorini
author: John Snow Labs
name: afriberta_v2_large
date: 2024-11-10
tags: [en, open_source, onnx, embeddings, xlm_roberta]
task: Embeddings
language: en
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: onnx
annotator: XlmRoBertaEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_v2_large` is a English model originally trained by castorini.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_v2_large_en_5.5.1_3.0_1731282953480.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_v2_large_en_5.5.1_3.0_1731282953480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")

embeddings = XlmRoBertaEmbeddings.pretrained("afriberta_v2_large","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")

pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")

val embeddings = XlmRoBertaEmbeddings.pretrained("afriberta_v2_large","en")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|afriberta_v2_large|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[document, token]|
|Output Labels:|[xlm_roberta]|
|Language:|en|
|Size:|698.8 MB|

## References

https://huggingface.co/castorini/afriberta_v2_large
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
layout: model
title: English afriberta_v2_large_pipeline pipeline XlmRoBertaEmbeddings from castorini
author: John Snow Labs
name: afriberta_v2_large_pipeline
date: 2024-11-10
tags: [en, open_source, pipeline, onnx]
task: Embeddings
language: en
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
annotator: PipelineModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_v2_large_pipeline` is a English model originally trained by castorini.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_v2_large_pipeline_en_5.5.1_3.0_1731282989499.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_v2_large_pipeline_en_5.5.1_3.0_1731282989499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

pipeline = PretrainedPipeline("afriberta_v2_large_pipeline", lang = "en")
annotations = pipeline.transform(df)

```
```scala

val pipeline = new PretrainedPipeline("afriberta_v2_large_pipeline", lang = "en")
val annotations = pipeline.transform(df)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|afriberta_v2_large_pipeline|
|Type:|pipeline|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Language:|en|
|Size:|698.8 MB|

## References

https://huggingface.co/castorini/afriberta_v2_large

## Included Models

- DocumentAssembler
- TokenizerModel
- XlmRoBertaEmbeddings
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
layout: model
title: English bert_base_chinese_finetuned_food BertForTokenClassification from zhiguoxu
author: John Snow Labs
name: bert_base_chinese_finetuned_food
date: 2024-11-10
tags: [en, open_source, onnx, token_classification, bert, ner]
task: Named Entity Recognition
language: en
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: onnx
annotator: BertForTokenClassification
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_food` is a English model originally trained by zhiguoxu.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_food_en_5.5.1_3.0_1731279799981.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_food_en_5.5.1_3.0_1731279799981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')

tokenizer = Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')

tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_food","en") \
.setInputCols(["documents","token"]) \
.setOutputCol("ner")

pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCols("text")
.setOutputCols("document")

val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")

val tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_food", "en")
.setInputCols(Array("documents","token"))
.setOutputCol("ner")

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier))
val data = Seq("I love spark-nlp").toDS.toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|bert_base_chinese_finetuned_food|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[document, token]|
|Output Labels:|[ner]|
|Language:|en|
|Size:|381.1 MB|

## References

https://huggingface.co/zhiguoxu/bert-base-chinese-finetuned-food
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
layout: model
title: English bert_base_chinese_finetuned_food_pipeline pipeline BertForTokenClassification from zhiguoxu
author: John Snow Labs
name: bert_base_chinese_finetuned_food_pipeline
date: 2024-11-10
tags: [en, open_source, pipeline, onnx]
task: Named Entity Recognition
language: en
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
annotator: PipelineModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_food_pipeline` is a English model originally trained by zhiguoxu.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_food_pipeline_en_5.5.1_3.0_1731279819532.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_food_pipeline_en_5.5.1_3.0_1731279819532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

pipeline = PretrainedPipeline("bert_base_chinese_finetuned_food_pipeline", lang = "en")
annotations = pipeline.transform(df)

```
```scala

val pipeline = new PretrainedPipeline("bert_base_chinese_finetuned_food_pipeline", lang = "en")
val annotations = pipeline.transform(df)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|bert_base_chinese_finetuned_food_pipeline|
|Type:|pipeline|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Language:|en|
|Size:|381.2 MB|

## References

https://huggingface.co/zhiguoxu/bert-base-chinese-finetuned-food

## Included Models

- DocumentAssembler
- TokenizerModel
- BertForTokenClassification
Loading

0 comments on commit ab69789

Please sign in to comment.