Skip to content

Releases: embeddings-benchmark/mteb

1.31.6

30 Jan 22:22
Compare
Choose a tag to compare

1.31.6 (2025-01-30)

Fix

  • fix: Filling missing metadata for leaderboard release (#1895)

  • Update ArxivClusteringS2S.py

  • fill some metadat for retrieval

  • fill in the reste of missing metadata

  • fix metadata

  • fix climatefever metadata

  • fix: Added CQADupstack annotations

  • removed annotation for non-exisitant task

  • format

  • Added financial to other financial dataset

  • Moved ArguAna annotation to derivate datasets


Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (938e90f)

Unknown

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

  • adding reference to mteb arena

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (d0bb5b9)

  • Update tasks table (f258cfc)

  • Update tasks table (6cc0560)

  • Update tasks table (7996458)

  • Docs: update docs according to current state (#1870)

  • update docs

  • Apply suggestions from code review

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

  • update readme

  • Update README.md

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>


Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (7e5d6c8)

  • Update tasks table (0a59704)

  • Feat: Add FaMTEB (Farsi/Persian Text Embedding Benchmark) (#1843)

  • Add Summary Retrieval Task

  • Add FaMTEBClassification

  • Add FaMTEBClustering

  • Add FaMTEBPairClassification

  • Add FaMTEBRetrieval and BEIRFA and FaMTEBSTS

  • Add FaMTEBSummaryRetrieval

  • Add FaMTEB to benchmarks

  • fix benchmark names

  • temporary fix metadata

  • Fix dataset revisions

  • Update SummaryRetrievalEvaluator.py

  • Update task files

  • Update task files

  • add data domain and subtask description

  • Update AbsTaskSummaryRetrieval and FaMTEBSummaryRetrieval

  • Update AbsTaskSummaryRetrieval

  • Add mock task

  • Update AbsTaskSummaryRetrieval

  • Update AbsTaskSummaryRetrieval

  • make lint

  • Refactor SummaryRetrieval to subclass BitextMining

  • Add aggregated datasets


Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Co-authored-by: e.zeinivand <zeinivand@ymail.com>
Co-authored-by: Erfun76 <59398902+Erfun76@users.noreply.github.com> (f3404b4)

1.31.5

29 Jan 14:15
Compare
Choose a tag to compare

1.31.5 (2025-01-29)

Fix

  • fix: Limited plotly version to be less than 6.0.0 (#1902)

Limited plotly version to be less than 6.0.0 (cec0ed4)

Unknown

update stella meta (976bdd5)

1.31.4

29 Jan 11:29
Compare
Choose a tag to compare

1.31.4 (2025-01-29)

Fix

  • fix: Allow aggregated tasks within benchmarks (#1771)

  • fix: Allow aggregated tasks within benchmarks

Fixes #1231

  • feat: Update task filtering, fixing bug on MTEB
  • Updated task filtering adding exclusive_language_filter and hf_subset
  • fix bug in MTEB where cross-lingual splits were included
  • added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)

The following code outlines the problems:

import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC

task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
# was eq. to:
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;])
task.hf_subsets
# correct filtering to English datasets:
# [&#39;en&#39;, &#39;de-en&#39;, &#39;es-en&#39;, &#39;pl-en&#39;, &#39;zh-en&#39;]
# However it should be:
# [&#39;en&#39;]

# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
task.hf_subsets
# [&#39;en&#39;]
# eq. to
task = mteb.get_task(&#34;STS22&#34;, hf_subsets=[&#34;en&#34;])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;], exclusive_language_filter=True)
  • format

  • remove "en-ext" from AmazonCounterfactualClassification

  • fixed mteb(deu)

  • fix: simplify in a few areas

  • wip

  • tmp

  • sav

  • Allow aggregated tasks within benchmarks
    Fixes #1231

  • ensure correct formatting of eval_langs

  • ignore aggregate dataset

  • clean up dummy cases

  • add to mteb(eng, classic)

  • format

  • clean up

  • Allow aggregated tasks within benchmarks
    Fixes #1231

  • added fixed from comments

  • fix merge

  • format

  • Updated task type

  • Added minor fix for dummy tasks (8fb59a4)

Unknown

1.31.3

28 Jan 15:24
Compare
Choose a tag to compare

1.31.3 (2025-01-28)

Fix

  • fix: External results are preferred when only they have the needed splits (#1893)

join_revisions now prefers task_results where the scores are not empty (2a41730)

1.31.2

28 Jan 10:11
Compare
Choose a tag to compare

1.31.2 (2025-01-28)

Fix

  • fix: update voyage exp metadata (#1888)

  • fix: update voyage exp metadata

  • aded number of parameters (e623771)

1.31.1

26 Jan 07:36
Compare
Choose a tag to compare

1.31.1 (2025-01-26)

Fix

  • fix: fix jina v1, 2 models (#1872)

fix jina models (1d66089)

Unknown

  • doc: update pr template (#1871)

  • doc: update pr template

  • group testing & add: do not delete


Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> (95714d0)

1.31.0

25 Jan 16:50
Compare
Choose a tag to compare

1.31.0 (2025-01-25)

Feature

  • feat: add instruct wrapper (#1768)

  • add instruct wrapper

  • use get_task_instruction

  • add logging messages

  • apply based on PromptType

  • update description

  • change example model

  • move nvembed

  • Update mteb/models/instruct_wrapper.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

  • update docstrings

  • add instruction to docs

  • Apply suggestions from code review

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

  • lint

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (ee0f15a)

1.30.0

25 Jan 04:05
Compare
Choose a tag to compare

1.30.0 (2025-01-25)

Feature

  • feat: Integrating ChemTEB (#1708)

  • Add SMILES, AI Paraphrase and Inter-Source Paragraphs PairClassification Tasks

  • Add chemical subsets of NQ and HotpotQA datasets as Retrieval tasks

  • Add PubChem Synonyms PairClassification task

  • Update task init for previously added tasks

  • Add nomic-bert loader

  • Add a script to run the evaluation pipeline for chemical-related tasks

  • Add 15 Wikipedia article classification tasks

  • Add PairClassification and BitextMining tasks for Coconut SMILES

  • Fix naming of some Classification and PairClassification tasks

  • Fix some classification tasks naming issues

  • Integrate WANDB with benchmarking script

  • Update .gitignore

  • Fix nomic_models.py issue with retrieval tasks, similar to issue #1115 in original repo

  • Add one chemical model and some SentenceTransformer models

  • Fix a naming issue for SentenceTransformer models

  • Add OpenAI, bge-m3 and matscibert models

  • Add PubChem SMILES Bitext Mining tasks

  • Change metric namings to be more descriptive

  • Add English e5 and bge v1 models, all the sizes

  • Add two Wikipedia Clustering tasks

  • Add a try-except in evaluation script to skip faulty models during the benchmark.

  • Add bge v1.5 models and clustering score extraction to json parser

  • Add Amazon Titan embedding models

  • Add Cohere Bedrock models

  • Add two SDS Classification tasks

  • Add SDS Classification tasks to classification init and chem_eval

  • Add a retrieval dataset, update dataset names and revisions

  • Update revision for the CoconutRetrieval dataset: handle duplicate SMILES (documents)

  • Update CoconutSMILES2FormulaPC task

  • Change CoconutRetrieval dataset to a smaller one

  • Update some models

  • Integrate models added in ChemTEB (such as amazon, cohere bedrock and nomic bert) with latest modeling format in mteb.
  • Update the metadata for the mentioned models
  • Fix a typo
    open_weights argument is repeated twice

  • Update ChemTEB tasks

  • Rename some tasks for better readability.
  • Merge some BitextMining and PairClassification tasks into a single task with subsets (PubChemSMILESBitextMining and PubChemSMILESPC)
  • Add a new multilingual task (PubChemWikiPairClassification) consisting of 12 languages.
  • Update dataset paths, revisions and metadata for most tasks.
  • Add a Chemistry domain to TaskMetadata
  • Remove unnecessary files and tasks for MTEB

  • Update some ChemTEB tasks

  • Move PubChemSMILESBitextMining to eng folder
  • Add citations for tasks involving SDS, NQ, Hotpot, PubChem data
  • Update Clustering tasks category
  • Change main_score for PubChemAISentenceParaphrasePC
  • Create ChemTEB benchmark

  • Remove CoconutRetrieval

  • Update tasks and benchmarks tables with ChemTEB

  • Mention ChemTEB in readme

  • Fix some issues, update task metadata, lint

  • eval_langs fixed
  • Dataset path was fixed for two datasets
  • Metadata was completed for all tasks, mainly following fields: date, task_subtypes, dialect, sample_creation
  • ruff lint
  • rename nomic_bert_models.py to nomic_bert_model.py and update it.
  • Remove nomic_bert_model.py as it is now compatible with SentenceTransformer.

  • Remove WikipediaAIParagraphsParaphrasePC task due to being trivial.

  • Merge amazon_models and cohere_bedrock_models.py into bedrock_models.py

  • Remove unnecessary load_data for some tasks.

  • Update bedrock_models.py, openai_models.py and two dataset revisions

  • Text should be truncated for amazon text embedding models.
  • text-embedding-ada-002 returns null embeddings for some inputs with 8192 tokens.
  • Two datasets are updated, dropping very long samples (len > 99th percentile)
  • Add a layer of dynamic truncation for amazon models in bedrock_models.py

  • Replace metadata_dict with self.metadata in PubChemSMILESPC.py

  • fix model meta for bedrock models

  • Add reference comment to original Cohere API implementation (4d66434)

Unknown

1.29.16

22 Jan 12:11
Compare
Choose a tag to compare

1.29.16 (2025-01-22)

Fix

  • fix: Added correct training data annotation to LENS (#1859)

Added correct training data annotation to LENS (e775436)

1.29.15

22 Jan 11:50
Compare
Choose a tag to compare

1.29.15 (2025-01-22)

Fix

  • fix: Adding missing model meta (#1856)

  • Added CDE models

  • Added bge-en-icl

  • Updated CDE to bge_full_data

  • Fixed public_training_data flag type to include boolean, as this is how all models are annotated

  • Added public training data link instead of bool to CDE and BGE

  • Added GME models

  • Changed Torch to PyTorch

  • Added metadata on LENS models

  • Added ember_v1

  • Added metadata for amazon titan

  • Removed GME implementation (692bd26)