Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fine-tuning CLIP-like models using contrastive-image-text example #29070

Merged
merged 2 commits into from
Feb 20, 2024
Merged

Add support for fine-tuning CLIP-like models using contrastive-image-text example #29070

merged 2 commits into from
Feb 20, 2024

Conversation

tjs-intel
Copy link
Contributor

What does this PR do?

The example contrastive-image-text works for fine-tuning models that have the model_type "clip", but for other models like "chinese_clip" and "siglip" the VisionTextDualEncoderConfig class is too specific to CLIP models.

This PR adds support for Chinese-CLIP and SigLIP vision models to be fine-tuned with the contrastive-image-text example.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts @patil-suraj @patrickvonplaten

@tjs-intel
Copy link
Contributor Author

Fixing up this PR as per the contributor guidelines now

@tjs-intel
Copy link
Contributor Author

Happy to receive suggestions for any test candidates

@tjs-intel
Copy link
Contributor Author

This has been manually tested by replacing openai/clip-vit-base-patch32 in the contrastive-image-text example with the following models:

	OFA-Sys/chinese-clip-vit-base-patch16
	facebook/metaclip-b32-400m
	google/siglip-so400m-patch14-384
	laion/CLIP-ViT-B-32-laion2B-s34B-b79K
	laion/CLIP-ViT-H-14-laion2B-s32B-b79K
	laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
	openai/clip-vit-base-patch32
	openai/clip-vit-large-patch14
	openai/clip-vit-large-patch14-336
	timm/ViT-SO400M-14-SigLIP-384

@amyeroberts
Copy link
Collaborator

Hi @tjs-intel, thanks for adding this! For the failing tests, could you try rebasing onto main? There was some recent issues we had with compatible library versions which should now be resolved

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@amyeroberts amyeroberts merged commit ee3af60 into huggingface:main Feb 20, 2024
22 checks passed
@tjs-intel tjs-intel deleted the support-clip-like-model-training branch February 20, 2024 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants