Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multimodal dataset based on COCO text-image pairs #559

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fabiocarrara
Copy link

@fabiocarrara fabiocarrara commented Dec 20, 2024

This PR adds two ANN datasets drived from the COCO text-image pairs dataset:

  1. COCO Text-to-Image Multimodal Dataset (coco-t2i-512-angular):
    • Text is used as queries, and images comprise the search set.
    • This dataset presents a challenge due to the distribution data shift between queries and the search set, with the 100 nearest neighbors of queries having a cosine similarity of 0.30 +- 0.02.
  2. COCO Image-to-Image Intra-modal Dataset (coco-i2i-512-angular):
    • Images are used as both queries and search set.
    • This dataset does not exhibit the distribution shift and can serve as a reference, sharing the same datapoints as the t2i dataset.

Extraction Process
Features vectors are the CLS output token of the OpenAI's CLIP with ViT-B/16 architecture (512 dimensions) of the visual or textual encoder. Thanks to @lorebianchi98 and @mesnico for performing extraction and preparation.

Split definition
Based on Karpathy's split of COCO 2014:

  • The search sets include vectors extracted from the images of the training set (113,287) and of the validation set (5,000), for a total of 118,287 vectors.
  • Queries:
    • Visual (i2i): 5,000 vectors from test set images.
    • Textual (t2i): 5,000 vectors from the first caption (out of the five available) of test set images.

@maumueller: Lucia (@vadicamo) told me you were searching for a multimodal dataset for the SISAP indexing challenge. You can check whether those are a good fit if still needed. Let me know if you'd like more details.

@fabiocarrara fabiocarrara marked this pull request as ready for review January 7, 2025 16:11
@fabiocarrara
Copy link
Author

Some (partial) results on coco-t2i-512-angular:

coco-t2i-512-angular

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant