Skip to content

Commit

Permalink
2024-02-11-bge_m3_xx (#14170)
Browse files Browse the repository at this point in the history
* Add model 2024-02-11-bge_m3_xx

* Update 2024-02-11-bge_m3_xx.md

---------

Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
  • Loading branch information
3 people authored Feb 11, 2024
1 parent 34b675c commit 3a56387
Showing 1 changed file with 100 additions and 0 deletions.
100 changes: 100 additions & 0 deletions docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
layout: model
title: Multilingual bge_m3 XlmRoBertaSentenceEmbeddings from BAII
author: John Snow Labs
name: bge_m3
date: 2024-02-11
tags: [xx, open_source, onnx]
task: Embeddings
language: xx
edition: Spark NLP 5.2.3
spark_version: 3.4
supported: true
engine: onnx
annotator: XlmRoBertaSentenceEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.bge_m3 is a Multilingual model originally trained by BAII.

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_m3_xx_5.2.3_3.4_1707668886363.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_m3_xx_5.2.3_3.4_1707668886363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")

sentencerDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")

embeddings =XlmRoBertaSentenceEmbeddings.pretrained("bge_m3 ","xx") \
.setInputCols(["sentence"]) \
.setOutputCol("embeddings")

pipeline = Pipeline().setStages([document_assembler, sentencerDL, embeddings])

pipelineModel = pipeline.fit(data)

pipelineDF = pipelineModel.transform(data)

```
```scala

val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("documents")

val sentencerDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
.setInputCols(["document"])
.setOutputCol("sentence")

val embeddings = XlmRoBertaSentenceEmbeddings
.pretrained("bge_m3 ", "xx")
.setInputCols(Array("sentence"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentencerDL, embeddings))

val pipelineModel = pipeline.fit(data)

val pipelineDF = pipelineModel.transform(data)
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|bge_m3|
|Compatibility:|Spark NLP 5.2.3+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[sentence]|
|Output Labels:|[sentence_embeddings]|
|Language:|xx|
|Size:|410.8 MB|
|Max sentence length:|32|

## References

https://huggingface.co/BAAI/bge-m3

0 comments on commit 3a56387

Please sign in to comment.