Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failed tests in #31851 #31879

Merged
merged 25 commits into from
Jul 10, 2024
Merged

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jul 10, 2024

Add more special cases added in #31851

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh ydshieh changed the title Revert "Revert "Fix _init_weights for ResNetPreTrainedModel"" Fix failed tests in #31851 Jul 10, 2024
@ydshieh
Copy link
Collaborator Author

ydshieh commented Jul 10, 2024

For the record: this is the list I come up with but we still get some failures.

Some entries
            special_param_names = [
                r"wav2vec2\.masked_spec_embed",
                r"wav2vec2\.feature_extractor\.conv_layers\..+\.conv\.weight",
                r"wav2vec2\.feature_projection\.projection\.weight",
                r"wav2vec2\.feature_projection\.projection\.bias",
                r"wav2vec2\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.",
                r"classifier\.weight",
                r"regnet\.embedder\.embedder\.convolution\.weight",
                r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.convolution\.weight",
                r"regnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
                r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.weight",
                r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.bias",
                r"classifier\..+\.weight",
                r"classifier\..+\.bias",
                r"resnet\.embedder\.embedder\.convolution\.weight",
                r"resnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
                r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.convolution\.weight",
                r"resnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
                r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.weight",
                r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.bias",
                r"unispeech_sat\.masked_spec_embed",
                r"wav2vec2_bert\.masked_spec_embed",
                r"pvt_v2\.encoder\.layers\..+\.patch_embedding\.proj\.weight",
                r"wav2vec2_conformer\.masked_spec_embed",
                r"wavlm\.masked_spec_embed",
                r"swiftformer\.patch_embed\.patch_embedding\..+\.weight",
                r"sew\.masked_spec_embed",
                r"bit\.embedder\.convolution\.weight",
                r"sew_d\.masked_spec_embed",
                r"vision_model\.embeddings\.patch_embedding\.weight",
                r"hubert\.masked_spec_embed",
                r"swinv2\.encoder\.layers\..+\.blocks\..+\.attention\.self\.logit_scale",
                r"data2vec_audio\.masked_spec_embed",
                r"unispeech\.masked_spec_embed",
                r"pvt\.encoder\.patch_embeddings\..+\.projection\.weight",
                r"unispeech_sat\.feature_extractor\.conv_layers\..+\.conv\.weight",
                r"wav2vec2_bert\.feature_projection\.projection\.weight",
                r"pvt_v2\.encoder\.layers\..+\.blocks\..+\.attention\.spatial_reduction\.weight",
                r"wav2vec2_conformer\.feature_extractor\.conv_layers\..+\.conv\.weight",
                r"wavlm\.feature_extractor\.conv_layers\..+\.conv\.weight",
                r"swiftformer\.encoder\.network\..+\.blocks\..+\.local_representation\.depth_wise_conv\.weight",
                r"sew\.feature_extractor\.conv_layers\..+\.conv\.weight",
                r"bit\.encoder\.stages\..+\.layers\..+\.downsample\.conv\.weight",
                r"sew_d\.feature_extractor\.conv_layers\..+\.conv\.weight",
                r"vision_model\.embeddings\.position_embedding\.weight",
                r"hubert\.feature_extractor\.conv_layers\..+\.conv\.weight",
                r"swinv2\.encoder\.layers\..+\.blocks\..+\.attention\.self\.logit_scale",
                r"data2vec_audio\.feature_extractor\.conv_layers\..+\.conv\.weigh",
                r"unispeech\.feature_extractor\.conv_layers\..+\.conv\.weight",
                r"pvt\.encoder\.patch_embeddings\..+\.projection\.bias",
                r"unispeech_sat\.feature_projection\.projection\.weight",
                r"unispeech_sat\.feature_projection\.projection\.bias",
                r"wav2vec2_bert\.feature_projection\.projection\.bias",
                r"pvt_v2\.encoder\.layers\..+\.blocks\..+\.mlp\.dwconv\.dwconv\.weight",
                r"wav2vec2_conformer\.feature_projection\.projection\.weight",
                r"wav2vec2_conformer\.feature_projection\.projection\.bias",
                r"wavlm\.feature_projection\.projection\.weight",
                r"wavlm\.feature_projection\.projection\.bias",
                r"swiftformer\.encoder\.network\..+\.blocks\..+\.local_representation\.point_wise_conv.+\.weight",
                r"sew\.encoder\.pos_conv_embed\.conv\.weight_g",
                r"bit\.encoder\.stages\..+\.layers\..+\.conv.+\.weight",
                r"sew_d\.encoder\.pos_conv_embed\.conv\.weight_g",
                r"vision_model\.encoder\.layers\..+\.self_attn\.k_proj\.weight",
                r"hubert\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.+",
                r"swinv2\.encoder\.layers\..+\.blocks\..+\.attention\.self\.logit_scale",
                r"data2vec_audio\.feature_projection\.projection\.weight",
                r"unispeech\.feature_projection\.projection\.weight",
                r"pvt\.encoder\.block\..+\..+\.attention\.self\.sequence_reduction\.weight",
                r"wav2vec2_bert\.encoder\.layers\..+\.self_attn\.pos_bias_u",
                r"sew\.encoder\.pos_conv_embed\.conv\.weight_v",
                r"sew_d\.encoder\.pos_conv_embed\.conv\.weight_v",
                r"vision_model\.encoder\.layers\..+\.self_attn\.v_proj\.weight",
                r"data2vec_audio\.feature_projection\.projection\.bias",
                r"unispeech\.feature_projection\.projection\.bias",
                r"pvt\.encoder\.block\..+\..+\.attention\.self\.sequence_reduction.bias",
                r"unispeech_sat\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.+",
                r"wav2vec2_bert\.encoder\.layers\..+\.self_attn\.pos_bias_v",
                r"wav2vec2_conformer\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.+",
                r"wavlm\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.+",
                r"swiftformer\.encoder\.network\..+\.blocks\..+\.attn\.w_g",
                r"vision_model\.encoder\.layers\..+\.self_attn\.q_proj\.weight",
                r"data2vec_audio\.encoder\.pos_conv_embed\.layers\..+\.conv\.weight",
                r"unispeech\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.+",
                r"wav2vec2_bert\.encoder\.layers\..+\.conv_module\.pointwise_conv1\.weight",
                r"wav2vec2_conformer\.encoder\.layers\..+\.self_attn\.pos_bias_u",
                r"wavlm\.encoder\.layers\..+\.attention\.rel_attn_embed\.weight",
                r"swiftformer\.encoder\.network\..+\.blocks\..+\.attn\.to_query\.weight",
                r"vision_model\.encoder\.layers\..+\.self_attn\.out_proj\.weight",
                r"wav2vec2_bert\.encoder\.layers\..+\.conv_module\.depthwise_conv\.weight",
                r"wav2vec2_conformer\.encoder\.layers\..+\.self_attn\.pos_bias_v",
                r"swiftformer\.encoder\.network\..+\.blocks\..+\.attn\.to_key\.weight",
                r"vision_model\.encoder\.layers\..+\.mlp\.fc1\.weight",
                r"vision_model\.encoder\.layers\..+\.mlp\.fc1\.bias",
                r"swiftformer\.encoder\.network\..+\.blocks\..+\.attn\.proj\.weight",
                r"wav2vec2_bert\.encoder\.layers\..+\.conv_module\.pointwise_conv2\.weight",
                r"wav2vec2_conformer\.encoder\.layers\..+\.conv_module\.pointwise_conv1\.weight",
            ]

@ydshieh ydshieh marked this pull request as ready for review July 10, 2024 07:22
@ydshieh ydshieh requested a review from amyeroberts July 10, 2024 07:48
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work on this - v. tricky!

@ydshieh ydshieh merged commit 9d98706 into main Jul 10, 2024
22 of 24 checks passed
@ydshieh ydshieh deleted the revert-31868-revert-31851-fix_init_resnet branch July 10, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants