PyTorch implementation of CLIP (Contrastive Language-Image Pre-Training).
CLIP: Learning Transferable Visual Models From Natural Language Supervision.
CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs.