You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I believe that supporting to run "openai" pre-trained checkpoint with non-QuickGELU models (e.g., RN50 and ViT-B-32) leads to bugs. The pattern is the following:
Fine-tune an OpenAI-pre-trained CLIP model (e.g., with --model ViT-B-32 --pretrained openai) inadvertently using a non-QuickGELU model, which is fine because it's hardcoded to use QuickGELU anyway.
Use the same command to run the evaluation but change openai for the fine-tuned model path.
What will happen is that the native GELU will be used instead of QuickGELU (the latter was used to train the model), and wrong results will be obtained.
This happened to me, as well as others (though there's a pending confirmation from them):
Would it be possible to fix/avoid this error-prone pattern? I see some ways:
Not allowing the running of non-QuickGELU models with the "openai" pre-trained checkpoint. Maybe it can be detected with a special error message (pointing to this issue), inviting the user to use the correct model variant.
Give a warning to the user when "openai" is used without the QuickGELU model (or without the --force-quick-gelu flag).
The text was updated successfully, but these errors were encountered:
Hey, I believe that supporting to run "openai" pre-trained checkpoint with non-QuickGELU models (e.g.,
RN50
andViT-B-32
) leads to bugs. The pattern is the following:--model ViT-B-32 --pretrained openai
) inadvertently using a non-QuickGELU model, which is fine because it's hardcoded to use QuickGELU anyway.openai
for the fine-tuned model path.What will happen is that the native GELU will be used instead of QuickGELU (the latter was used to train the model), and wrong results will be obtained.
This happened to me, as well as others (though there's a pending confirmation from them):
Would it be possible to fix/avoid this error-prone pattern? I see some ways:
--force-quick-gelu
flag).The text was updated successfully, but these errors were encountered: