Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

bryant1410 · 2023-12-23T03:55:14Z

Hey, I believe that supporting to run "openai" pre-trained checkpoint with non-QuickGELU models (e.g., RN50 and ViT-B-32) leads to bugs. The pattern is the following:

Fine-tune an OpenAI-pre-trained CLIP model (e.g., with --model ViT-B-32 --pretrained openai) inadvertently using a non-QuickGELU model, which is fine because it's hardcoded to use QuickGELU anyway.
Use the same command to run the evaluation but change openai for the fine-tuned model path.

What will happen is that the native GELU will be used instead of QuickGELU (the latter was used to train the model), and wrong results will be obtained.

This happened to me, as well as others (though there's a pending confirmation from them):

Would it be possible to fix/avoid this error-prone pattern? I see some ways:

Not allowing the running of non-QuickGELU models with the "openai" pre-trained checkpoint. Maybe it can be detected with a special error message (pointing to this issue), inviting the user to use the correct model variant.
Give a warning to the user when "openai" is used without the QuickGELU model (or without the --force-quick-gelu flag).

The text was updated successfully, but these errors were encountered:

rwightman mentioned this issue Jul 13, 2024

[bug] loading saved weights of an open_clip model does give back the same results #915

Closed

bryant1410 mentioned this issue Oct 23, 2024

All default pretrained weights pushed to HF hub #970

Merged

rwightman closed this as completed Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

bryant1410 commented Dec 23, 2023

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

Error-prone behavior of supporting the "openai" checkpoint with non-QuickGELU models #771

Comments

bryant1410 commented Dec 23, 2023