Add support for fine-tuning CLIP-like models using contrastive-image-text example #29070

tjs-intel · 2024-02-16T22:31:47Z

What does this PR do?

The example contrastive-image-text works for fine-tuning models that have the model_type "clip", but for other models like "chinese_clip" and "siglip" the VisionTextDualEncoderConfig class is too specific to CLIP models.

This PR adds support for Chinese-CLIP and SigLIP vision models to be fine-tuned with the contrastive-image-text example.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts @patil-suraj @patrickvonplaten

tjs-intel · 2024-02-16T23:10:53Z

Fixing up this PR as per the contributor guidelines now

tjs-intel · 2024-02-16T23:11:45Z

Happy to receive suggestions for any test candidates

tjs-intel · 2024-02-16T23:15:33Z

This has been manually tested by replacing openai/clip-vit-base-patch32 in the contrastive-image-text example with the following models:

	OFA-Sys/chinese-clip-vit-base-patch16
	facebook/metaclip-b32-400m
	google/siglip-so400m-patch14-384
	laion/CLIP-ViT-B-32-laion2B-s34B-b79K
	laion/CLIP-ViT-H-14-laion2B-s32B-b79K
	laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
	openai/clip-vit-base-patch32
	openai/clip-vit-large-patch14
	openai/clip-vit-large-patch14-336
	timm/ViT-SO400M-14-SigLIP-384

tjs-intel · 2024-02-17T00:04:45Z

Not sure what's going on here:
https://app.circleci.com/pipelines/github/huggingface/transformers/84689/workflows/02d18e8c-af6e-465d-8625-fb3dc53bc03e/jobs/1095368/parallel-runs/0/steps/0-116
https://app.circleci.com/pipelines/github/huggingface/transformers/84689/workflows/02d18e8c-af6e-465d-8625-fb3dc53bc03e/jobs/1095369/parallel-runs/0/steps/0-115
https://app.circleci.com/pipelines/github/huggingface/transformers/84689/workflows/02d18e8c-af6e-465d-8625-fb3dc53bc03e/jobs/1095365/parallel-runs/0/steps/0-117

amyeroberts · 2024-02-19T19:20:53Z

Hi @tjs-intel, thanks for adding this! For the failing tests, could you try rebasing onto main? There was some recent issues we had with compatible library versions which should now be resolved

…ve-image-text example

amyeroberts

Thanks for adding!

HuggingFaceDocBuilderDev · 2024-02-20T11:55:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tjs-intel added 2 commits February 19, 2024 19:55

add support for siglip and chinese-clip model training with contrasti…

be4d328

…ve-image-text example

codebase fixups

c8a40df

amyeroberts approved these changes Feb 20, 2024

View reviewed changes

amyeroberts merged commit ee3af60 into huggingface:main Feb 20, 2024
22 checks passed

tjs-intel deleted the support-clip-like-model-training branch February 20, 2024 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for fine-tuning CLIP-like models using contrastive-image-text example #29070

Add support for fine-tuning CLIP-like models using contrastive-image-text example #29070

tjs-intel commented Feb 16, 2024

tjs-intel commented Feb 16, 2024

tjs-intel commented Feb 16, 2024

tjs-intel commented Feb 16, 2024

tjs-intel commented Feb 17, 2024

amyeroberts commented Feb 19, 2024

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Feb 20, 2024

Add support for fine-tuning CLIP-like models using contrastive-image-text example #29070

Add support for fine-tuning CLIP-like models using contrastive-image-text example #29070

Conversation

tjs-intel commented Feb 16, 2024

What does this PR do?

Before submitting

Who can review?

tjs-intel commented Feb 16, 2024

tjs-intel commented Feb 16, 2024

tjs-intel commented Feb 16, 2024

tjs-intel commented Feb 17, 2024

amyeroberts commented Feb 19, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 20, 2024