Releases: embeddings-benchmark/mteb
1.31.6
1.31.6 (2025-01-30)
Fix
-
fix: Filling missing metadata for leaderboard release (#1895)
-
Update ArxivClusteringS2S.py
-
fill some metadat for retrieval
-
fill in the reste of missing metadata
-
fix metadata
-
fix climatefever metadata
-
fix: Added CQADupstack annotations
-
removed annotation for non-exisitant task
-
format
-
Added financial to other financial dataset
-
Moved ArguAna annotation to derivate datasets
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (938e90f
)
Unknown
-
Update tasks table (
12ad5bd
) -
Update tasks table (
9076213
) -
Update tasks table (
4bb4ec6
) -
Update tasks table (
d510ddb
) -
Update tasks table (
e35c8dd
) -
Update tasks table (
9a6275e
) -
Update tasks table (
d9ab239
) -
Update tasks table (
21b60f5
) -
Update tasks table (
c46cb8b
) -
Update tasks table (
0bbc4c7
) -
Update tasks table (
3123d1c
) -
Update tasks table (
f7438b8
) -
Update tasks table (
51faf65
) -
Update tasks table (
1b76261
) -
Update tasks table (
67f8a79
) -
Update tasks table (
933f4af
) -
Update tasks table (
599849b
) -
Update tasks table (
ff4ae8d
) -
Update tasks table (
780a7d3
) -
Update tasks table (
c34ef64
) -
Update tasks table (
b23597c
) -
Update tasks table (
1030888
) -
Update tasks table (
913112a
) -
Update tasks table (
25a6f17
) -
Update tasks table (
e07ffe8
) -
Update tasks table (
b78525d
) -
Update tasks table (
6989fd5
) -
Update tasks table (
b7e412d
) -
Update tasks table (
2e817b0
) -
Update tasks table (
28ad172
) -
Update tasks table (
2850a97
) -
Update tasks table (
77681bf
) -
Adding a banner to the new MMTEB leaderboard (#1908)
-
Adding a banner to the new MMTEB leaderboard
-
linting
-
Update mteb/leaderboard/app.py
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
- adding reference to mteb arena
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (d0bb5b9
)
-
Update tasks table (
f258cfc
) -
Update tasks table (
6cc0560
) -
Update tasks table (
7996458
) -
Docs: update docs according to current state (#1870)
-
update docs
-
Apply suggestions from code review
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
-
update readme
-
Update README.md
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (7e5d6c8
)
-
Update tasks table (
0a59704
) -
Feat: Add FaMTEB (Farsi/Persian Text Embedding Benchmark) (#1843)
-
Add Summary Retrieval Task
-
Add FaMTEBClassification
-
Add FaMTEBClustering
-
Add FaMTEBPairClassification
-
Add FaMTEBRetrieval and BEIRFA and FaMTEBSTS
-
Add FaMTEBSummaryRetrieval
-
Add FaMTEB to benchmarks
-
fix benchmark names
-
temporary fix metadata
-
Fix dataset revisions
-
Update SummaryRetrievalEvaluator.py
-
Update task files
-
Update task files
-
add data domain and subtask description
-
Update AbsTaskSummaryRetrieval and FaMTEBSummaryRetrieval
-
Update AbsTaskSummaryRetrieval
-
Add mock task
-
Update AbsTaskSummaryRetrieval
-
Update AbsTaskSummaryRetrieval
-
make lint
-
Refactor SummaryRetrieval to subclass BitextMining
-
Add aggregated datasets
Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: e.zeinivand <zeinivand@ymail.com>
Co-authored-by: Erfun76 <59398902+Erfun76@users.noreply.github.com> (f3404b4
)
1.31.5
1.31.5 (2025-01-29)
Fix
- fix: Limited plotly version to be less than 6.0.0 (#1902)
Limited plotly version to be less than 6.0.0 (cec0ed4
)
Unknown
-
Update tasks table (
42c175f
) -
Update tasks table (
a5d1538
) -
Update tasks table (
ef929f8
) -
Update tasks table (
d6deab1
) -
Update tasks table (
1c84c1c
) -
Update tasks table (
cc1e899
) -
update stella/jasper metainfo (#1896)
update stella meta (976bdd5
)
1.31.4
1.31.4 (2025-01-29)
Fix
-
fix: Allow aggregated tasks within benchmarks (#1771)
-
fix: Allow aggregated tasks within benchmarks
Fixes #1231
- feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
-
format
-
remove "en-ext" from AmazonCounterfactualClassification
-
fixed mteb(deu)
-
fix: simplify in a few areas
-
wip
-
tmp
-
sav
-
Allow aggregated tasks within benchmarks
Fixes #1231 -
ensure correct formatting of eval_langs
-
ignore aggregate dataset
-
clean up dummy cases
-
add to mteb(eng, classic)
-
format
-
clean up
-
Allow aggregated tasks within benchmarks
Fixes #1231 -
added fixed from comments
-
fix merge
-
format
-
Updated task type
-
Added minor fix for dummy tasks (
8fb59a4
)
Unknown
-
Update tasks table (
3ee0785
) -
Update tasks table (
02f8ad5
) -
Update tasks table (
c77c82c
) -
Update tasks table (
e8b8ac0
) -
Update tasks table (
50f305f
) -
Update tasks table (
2689cb8
) -
Update tasks table (
24d5373
) -
Update tasks table (
e487eff
) -
Update tasks table (
8bc101f
) -
Update tasks table (
cebf5b6
) -
Update tasks table (
1ead72f
) -
Update tasks table (
d939627
)
1.31.3
1.31.2
1.31.1
1.31.0
1.31.0 (2025-01-25)
Feature
-
feat: add instruct wrapper (#1768)
-
add instruct wrapper
-
use get_task_instruction
-
add logging messages
-
apply based on PromptType
-
update description
-
change example model
-
move nvembed
-
Update mteb/models/instruct_wrapper.py
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
-
update docstrings
-
add instruction to docs
-
Apply suggestions from code review
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
- lint
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (ee0f15a
)
1.30.0
1.30.0 (2025-01-25)
Feature
-
feat: Integrating ChemTEB (#1708)
-
Add SMILES, AI Paraphrase and Inter-Source Paragraphs PairClassification Tasks
-
Add chemical subsets of NQ and HotpotQA datasets as Retrieval tasks
-
Add PubChem Synonyms PairClassification task
-
Update task init for previously added tasks
-
Add nomic-bert loader
-
Add a script to run the evaluation pipeline for chemical-related tasks
-
Add 15 Wikipedia article classification tasks
-
Add PairClassification and BitextMining tasks for Coconut SMILES
-
Fix naming of some Classification and PairClassification tasks
-
Fix some classification tasks naming issues
-
Integrate WANDB with benchmarking script
-
Update .gitignore
-
Fix
nomic_models.py
issue with retrieval tasks, similar to issue #1115 in original repo -
Add one chemical model and some SentenceTransformer models
-
Fix a naming issue for SentenceTransformer models
-
Add OpenAI, bge-m3 and matscibert models
-
Add PubChem SMILES Bitext Mining tasks
-
Change metric namings to be more descriptive
-
Add English e5 and bge v1 models, all the sizes
-
Add two Wikipedia Clustering tasks
-
Add a try-except in evaluation script to skip faulty models during the benchmark.
-
Add bge v1.5 models and clustering score extraction to json parser
-
Add Amazon Titan embedding models
-
Add Cohere Bedrock models
-
Add two SDS Classification tasks
-
Add SDS Classification tasks to classification init and chem_eval
-
Add a retrieval dataset, update dataset names and revisions
-
Update revision for the CoconutRetrieval dataset: handle duplicate SMILES (documents)
-
Update
CoconutSMILES2FormulaPC
task -
Change CoconutRetrieval dataset to a smaller one
-
Update some models
- Integrate models added in ChemTEB (such as amazon, cohere bedrock and nomic bert) with latest modeling format in mteb.
- Update the metadata for the mentioned models
-
Fix a typo
open_weights
argument is repeated twice -
Update ChemTEB tasks
- Rename some tasks for better readability.
- Merge some BitextMining and PairClassification tasks into a single task with subsets (
PubChemSMILESBitextMining
andPubChemSMILESPC
) - Add a new multilingual task (
PubChemWikiPairClassification
) consisting of 12 languages. - Update dataset paths, revisions and metadata for most tasks.
- Add a
Chemistry
domain toTaskMetadata
-
Remove unnecessary files and tasks for MTEB
-
Update some ChemTEB tasks
- Move
PubChemSMILESBitextMining
toeng
folder - Add citations for tasks involving SDS, NQ, Hotpot, PubChem data
- Update Clustering tasks
category
- Change
main_score
forPubChemAISentenceParaphrasePC
-
Create ChemTEB benchmark
-
Remove
CoconutRetrieval
-
Update tasks and benchmarks tables with ChemTEB
-
Mention ChemTEB in readme
-
Fix some issues, update task metadata, lint
eval_langs
fixed- Dataset path was fixed for two datasets
- Metadata was completed for all tasks, mainly following fields:
date
,task_subtypes
,dialect
,sample_creation
- ruff lint
- rename
nomic_bert_models.py
tonomic_bert_model.py
and update it.
-
Remove
nomic_bert_model.py
as it is now compatible with SentenceTransformer. -
Remove
WikipediaAIParagraphsParaphrasePC
task due to being trivial. -
Merge
amazon_models
andcohere_bedrock_models.py
intobedrock_models.py
-
Remove unnecessary
load_data
for some tasks. -
Update
bedrock_models.py
,openai_models.py
and two dataset revisions
- Text should be truncated for amazon text embedding models.
text-embedding-ada-002
returns null embeddings for some inputs with 8192 tokens.- Two datasets are updated, dropping very long samples (len > 99th percentile)
-
Add a layer of dynamic truncation for amazon models in
bedrock_models.py
-
Replace
metadata_dict
withself.metadata
inPubChemSMILESPC.py
-
fix model meta for bedrock models
-
Add reference comment to original Cohere API implementation (
4d66434
)
Unknown
- Update points table (
223bf32
)
1.29.16
1.29.15
1.29.15 (2025-01-22)
Fix
-
fix: Adding missing model meta (#1856)
-
Added CDE models
-
Added bge-en-icl
-
Updated CDE to bge_full_data
-
Fixed public_training_data flag type to include boolean, as this is how all models are annotated
-
Added public training data link instead of bool to CDE and BGE
-
Added GME models
-
Changed Torch to PyTorch
-
Added metadata on LENS models
-
Added ember_v1
-
Added metadata for amazon titan
-
Removed GME implementation (
692bd26
)