This crate provides simple pipelines that can be used out-of-the box to perform text-embedding and re-ranking using ONNX models.
They are built with 🧩 orp
(which relies on the 🦀 ort
runtime), and use 🤗 tokenizers
for token encoding.
[dependencies]
"gte-rs" = "0.9.1"
"orp" = "0.9.2"
Embedding:
let params = Parameters::default();
let pipeline = TextEmbeddingPipeline::new("gte-modernbert-base/tokenizer.json", ¶ms)?;
let model = Model::new("gte-modernbert-base/model.onnx", RuntimeParameters::default())?;
let inputs = TextInput::from_str(&[
"text content",
"some more content",
//...
]);
let embeddings = model.inference(inputs, &pipeline, ¶ms)?;
Re-ranking:
let params = Parameters::default();
let pipeline = RerankingPipeline::new("gte-reranker-modernbert-base/tokenizer.json", ¶ms)?;
let model = Model::new("gte-reranker-modernbert-base/model.onnx", RuntimeParameters::default())?;
let inputs = TextInput::from_str(&[
("one candidate", "query"),
("another candidate", "query"),
//...
]);
let similarities = model.inference(inputs, &pipeline, ¶ms)?;
Please refer the the source code in examples
for complete examples.
For english language, the gte-modernbert-base
model outperforms larger models on retrieval with only 149M parameters, and runs efficiently on GPU and CPU. The gte-reranker-modernbert-base
version does re-ranking with similar characteristics. This post provides interesting insights about them.
This crate should be usable out-of-the box with other models, or easily adapted to other ones. Please report your own tests or requirements!
This project follows the same principles as the ones below. Refer to their documentation for more details:
- 🌿 gline-rs: inference engine for GLiNER models
- 🏷️ gliclass-rs: inference engine for GLiClass models