Add multimodal dataset based on COCO text-image pairs #559

fabiocarrara · 2024-12-20T17:42:29Z

This PR adds two ANN datasets drived from the COCO text-image pairs dataset:

COCO Text-to-Image Multimodal Dataset (coco-t2i-512-angular):
- Text is used as queries, and images comprise the search set.
- This dataset presents a challenge due to the distribution data shift between queries and the search set, with the 100 nearest neighbors of queries having a cosine similarity of 0.30 +- 0.02.
COCO Image-to-Image Intra-modal Dataset (coco-i2i-512-angular):
- Images are used as both queries and search set.
- This dataset does not exhibit the distribution shift and can serve as a reference, sharing the same datapoints as the t2i dataset.

Extraction Process
Features vectors are the CLS output token of the OpenAI's CLIP with ViT-B/16 architecture (512 dimensions) of the visual or textual encoder. Thanks to @lorebianchi98 and @mesnico for performing extraction and preparation.

Split definition
Based on Karpathy's split of COCO 2014:

The search sets include vectors extracted from the images of the training set (113,287) and of the validation set (5,000), for a total of 118,287 vectors.
Queries:
- Visual (i2i): 5,000 vectors from test set images.
- Textual (t2i): 5,000 vectors from the first caption (out of the five available) of test set images.

@maumueller: Lucia (@vadicamo) told me you were searching for a multimodal dataset for the SISAP indexing challenge. You can check whether those are a good fit if still needed. Let me know if you'd like more details.

fabiocarrara · 2025-01-08T17:02:43Z

Some (partial) results on coco-t2i-512-angular:

add coco image2image and text2image datasets

233c397

fabiocarrara marked this pull request as ready for review January 7, 2025 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multimodal dataset based on COCO text-image pairs #559

Add multimodal dataset based on COCO text-image pairs #559

fabiocarrara commented Dec 20, 2024 •

edited

Loading

fabiocarrara commented Jan 8, 2025

Add multimodal dataset based on COCO text-image pairs #559

Are you sure you want to change the base?

Add multimodal dataset based on COCO text-image pairs #559

Conversation

fabiocarrara commented Dec 20, 2024 • edited Loading

fabiocarrara commented Jan 8, 2025

fabiocarrara commented Dec 20, 2024 •

edited

Loading