Add multimodal dataset based on COCO text-image pairs #559
+9
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds two ANN datasets drived from the COCO text-image pairs dataset:
coco-t2i-512-angular
):coco-i2i-512-angular
):t2i
dataset.Extraction Process
Features vectors are the CLS output token of the OpenAI's CLIP with ViT-B/16 architecture (512 dimensions) of the visual or textual encoder. Thanks to @lorebianchi98 and @mesnico for performing extraction and preparation.
Split definition
Based on Karpathy's split of COCO 2014:
i2i
): 5,000 vectors from test set images.t2i
): 5,000 vectors from the first caption (out of the five available) of test set images.@maumueller: Lucia (@vadicamo) told me you were searching for a multimodal dataset for the SISAP indexing challenge. You can check whether those are a good fit if still needed. Let me know if you'd like more details.