Skip to content

LAION-AI/scaling-laws-openclip

Repository files navigation

Reproducible scaling laws for contrastive language-image learning [arXiv]

by Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev [arXiv:2212.07143] (Accepted at CVPR 2023)

Work still in progress. In this repository, we will provide the code for reproducing the experiments on large-scale CLIP pre-training and transfer to various downstream tasks for the paper "Reproducible scaling laws for contrastive language-image learning".

Stay tuned.

Until finalized, you may check

  • the OpenCLIP repository that points to the pre-trained models used in this study
  • the LAION-400m and LAION-5B composition instructions, the datasets used for openCLIP pre-training in this study
  • CLIP Benchmarking, transfer evaluation used in this study

Introduction

Scaling plots

To reproduce scaling plots from the paper, see the figures notebook.

Download pre-trained models

First, you need to clone the repo and install the requirements.

git clone https://github.com/LAION-AI/scaling-laws-openclip
cd scaling-laws-openclip
pip install -r requirements.txt

We provide a script, download_models.py, to download all pre-trained models used in the paper. To download all the 29 models used in the paper, use :

python download_models.py

You can also download a subset of the models. For instance:

python download_models.py --samples_seen 3B 13B --model ViT-B-32 --data 80M 400M 2B

will only download ViT-B/32 models with samples seen of 3B or 13B, trained on any of 80M/400M/2B LAION datasets.

Using pre-training models in OpenCLIP

Once you download the pre-trained models, you can also use them in OpenCLIP. Following is an example with ViT-H/14.

First, you need to download the model:

> python download_models.py --samples_seen 34B --model ViT-H-14 --data 2B

'Model-H-14_Data-2B_Samples-34B_lr-5e-4_bs-79k.pt' downloaded.

Once the model is downloaded, it is possible to directly use it in OpenCLIP:

import torch
import open_clip
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14', pretrained='Model-H-14_Data-2B_Samples-34B_lr-5e-4_bs-79k.pt')

For a complete example, see the inference notebook.

Citation

If you find this work helpful, please cite our paper:

@article{cherti2022reproducible,
  title={Reproducible scaling laws for contrastive language-image learning},
  author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
  journal={arXiv preprint arXiv:2212.07143},
  year={2022}
}

Acknowledgements